Brand New Features for Modelers, ML Engineers, and Analysts in Dataiku 11.1

ChristinaH · October 2022

Dataiku 11.1 has landed! In addition to many enhancements to existing features, this update also contains several exciting new capabilities across every stage of the entire data lifecycle, including at least three solutions for product ideas proposed by your fellow Dataiku Community members. Read on for more details on what’s new in Dataiku 11.1, and as always, check out the full details in the reference documentation and release notes.

3 Key Highlights for Modelers

Hyperparameter Optimization & Model Comparisons for Time Series Forecasting Models

Being able to reliably predict future trends is critical for organizations wishing to operate efficiently in a competitive and changing market. In Dataiku 11, we introduced a guided task within the familiar Visual ML framework that simplifies the process of developing and deploying time series forecasting models.

With the 11.1 update, users can now optimize hyperparameters for forecasting models using a k-fold cross-validation strategy which respects time ordering and ensures validation folds are both consecutive to training sets and non-overlapping. This approach allows you to more accurately model the situation you'll see at prediction time, where you'll model on past data and predict on forward-looking data.

Model comparisons for time series models are also now available, making it easier to compare and contrast forecasting models on dimensions such as performance metrics, time series resampling settings, features handling, algorithms, and training details.

Stratified Sampling for Classification Models

Have imbalanced data, or are trying to detect rare events? For binary or multi-class classification tasks, a new stratified sampling option is available when K-fold cross-tests are activated. The stratified option splits the samples in the same proportion as they appear in the whole population and can be used to eliminate sampling bias during cross-test validations.

Fun fact: this feature was inspired by a Product Ideas submission from @Marlan
Crosier, one of our Dataiku Neurons. Yet another great example of how Dataiku listens to our customers and delivers on their feedback!

Heatmaps for Image Classification Model Explainability

Ever wonder what the machine “sees” when it analyzes an image? Teams using Dataiku’s VisualML to build image classification models will appreciate a new explainability feature available in the What If? tab. When hovering over images for each predicted class, a heatmap is overlaid on the scored image to help us understand which pixels the model focused on when making this prediction. This visual aid is extremely useful not only for analyzing and explaining model behavior, but also when troubleshooting unexpected or incorrect predictions.

3 Exciting New MLOps Capabilities

Explainability for External Models

Advanced data scientists leveraging MLflow’s framework already have the ability to capture and store programmatic model experiments and import custom models into Dataiku to be deployed, monitored, and governed. In addition to interactive what-if analysis, users can now enjoy the full panel of model explainability tools for imported external models.

This includes partial dependence plots, subpopulation analysis, and individual prediction explanations. Not having to code these explainability techniques by hand saves data scientists time and also makes it possible for organizations to adopt consistent Responsible AI practices, regardless of a model’s origin.

Model Export in Python and MLflow Formats

You may already know Dataiku offers the ability to export certain types of models in Java (.jar), PMML, and ONNX formats, as well as generate an explanatory Jupyter (Python) Notebook with code to reproduce a model similar to the model that you trained. In Dataiku 11.1, users will also be able to export models in Python or MLflow formats.

These exports represent the exact same settings and capabilities as the saved model version in the VisualML interface and are useful for teams that want to deploy models outside of Dataiku’s platform or use MLflow for model inference and orchestration. Use the dataikuscoring Python package, available as ‘dataiku_scoring’ on PyPi, to run your model anywhere.

Deploy Clustering Models as API Endpoints

Dataiku has the ability to expose prediction and forecasting models, Python and R functions, SQL queries, and dataset lookups as API services. With Dataiku 11.1, users can now also deploy clustering models on API nodes for real-time model inference.

As an example of how this might be useful, let’s say you’re a retailer who asks shoppers on the mobile app or website to answer some quick questions about what styles appeal to them, whether they like to shop online or in-store, and what their favorite brands are. By submitting these inputs to be live-scored against a clustering model, which returns the most likely customer segment, the retailer can tailor the shopping experience to most closely match the preferences and needs of this particular shopper.

3 New Features for Visual Designers and Data Consumers

KPI Charts with Conditional Formatting

Key Performance Indicators, or KPIs, are measurable values that demonstrate how effectively a company is achieving key business objectives. The new KPI chart type in Dataiku allows users to present specific measures or values in a visually impactful way on dashboards. With conditional formatting options based on customizable rules, data consumers can quickly get a health check on important status and performance metrics.

Treemaps

Treemaps are a very useful chart type for visualizing large amounts of categorical data in a compact way; they enable data consumers to easily identify relationships and ratios between elements in a hierarchical data structure and layer on additional context and attributes using both size and color. By popular request, treemaps are now an option when building and publishing charts in Dataiku 11.1.

Improved User Experience for Sampling

By default, Dataiku displays a sample of records in the dataset explore view, prepare recipe, and charts to ensure quick, responsive visual feedback to users as they analyze and make changes. However, it’s useful to be reminded that you’re looking at a subset of the data versus the whole dataset before drawing conclusions. A “Sample/Whole Data” badge and additional context provide a clear indication of the number of rows displayed in the sample and whole dataset, whether filters are applied, and if the memory limit has been reached.

If you want to make any modifications to the sampling method, clicking on the badge will expand the sample settings panel, or you can navigate there directly at any time on the left-hand panel.

Would you look at that! This feature was recently requested as a Product Ideas submission from Sudipta Ghosh (@sudipta002
). Yet another great example of how Dataiku delivers solutions aligned with user feedback to improve our product!

Other Notable Updates and Enhancements

Dataiku 11.1 also introduces many other improvements to help you in your day-to-day work, such as support for additional data connection types and table descriptions, enhanced data exploration, cleansing, and export, and new capabilities for code-first users.

Explore & Clean

Display numbers <15 digits as double type, rather than in scientific notation.
New geoMakeValid function in prepare recipe to transform invalid geometry formats.

Export

Improved Excel export that preserves data type for dates and numbers.
Dataiku govern: embed rich descriptive artifacts via HTML markup in project fields, including sign-offs in export/import of governance blueprint versions.
Ability to export Dataiku datasets into MS Excel files with the parsed Date Time coming through into MS Excel as MS Excel DataTime Columns (another great product idea sourced from user feedback, this time from Dataiku Neuron Tom Brown (@tgb417
!).

Coding

Add support for editing project libraries from local PyCharm or VSCode IDEs via the Dataiku extension.
R4 support for custom Dataiku installations.

Connecting to Data

Catalog support for generic JDBC datasets for Trino/Starburst JDBC connections.
Display table descriptions in connection explorer for Snowflake, BigQuery & PostgreSQL.
Native connector for AlloyDB, Google’s new fully managed PostgreSQL solution.
Filter by connection type in admin pages.

Try it Out for Yourself!

Dataiku 11.1 is available for download or upgrade today, including all of these latest features and functionalities that were developed with users like you in mind. We hope that you’re as excited as we are about the pace of innovation from our product and technology teams, and look forward to receiving your feedback!

To get the full details about Dataiku 11.1, check out the full release notes below.

LET'S GO