Happy New Year! I hope you all enjoyed the holidays and are refreshed and excited for 2023. Last month, we delivered a bundle of holiday treats with Dataiku 11.2, but ‘tis the season that keeps on giving. A fresh batch of goodies is already available for you to explore in Dataiku 11.3.
Along with plenty of improvements to existing capabilities, this product update delivers brand-new features for data analysts, data scientists, and ML engineers and operators. Read on to discover our latest additions and as always, check out all the details in the release notes.
For Data Analysts
The “Anti-Join” Dataset
Sometimes, you want to inspect the records that are unmatched in a join operation. In addition to producing the output dataset containing the records that meet the join conditions, the join recipe can now optionally output a dataset that returns the unmatched rows for further analysis.
Share and Export Filtered Views
When interactively analyzing data by applying filters to a dashboard, it can be useful to export a subset of data or share a specific view with the broader team. For dashboards, a new “Copy to URL” button preserves your selected filter parameters in a URL that you can easily send to others. When they click the link, all the filters will be preserved so they can review the dashboard insights in exactly the same state. You can also export the filtered dashboard as a PDF.
The same is true for filtered datasets in the explore view - simply head to the actions menu to download the subset of data or export the view as another dataset with the filters preserved.
Shortcut to Visual Previews for Images and Geospatial Data
When working with non-tabular data types, it’s useful to visually inspect records in a more interpretable, multimedia format. After all, file paths and geo-coordinates may be practical for storage, but a picture is worth a thousand words.
In addition to the full-table image view released with Dataiku 11.2, you can now use the convenient “preview” action to preview individual images or geolocations. Simply press shift-v on a highlighted cell or use the right-click menu and choose “Preview” to see a pop-up of the selected image or a map plotting the geopoint or geometry. Pro tip: this shortcut is also useful with tabular data for viewing and/or copying the full value of a long string or array.
For Data Scientists
Deep Neural Network as a Native Algorithm
The Deep Neural Network is a new algorithm available in Dataiku AutoML for both regression and classification tasks. Based on the multilayer perceptron (MLP) architecture, this Deep Neural Network leverages state-of-the-art libraries for a robust, efficient, and scalable model.
Example of a Multilayer Perceptron architecture, with 3 hidden layers of 4 neurons each
Dataiku’s implementation offers:
- A searchable architecture and a customizable learning process.
- Regularization techniques to avoid overfitting.
- GPU support for training acceleration.
Once trained, a Deep Neural Network can be deployed, evaluated, and scored just like any other algorithm.
Feature-level View and Search in the Feature Store
A new feature-level view in Dataiku’s Feature Store makes it easier to search for and explore specific features you can reuse in your own projects and models. This view gives additional information and context about both the feature itself and the feature group it belongs to.
Visual Time Series Forecasting: Evaluate Beyond Forecast Horizon
When performing time series forecasting, the forecast horizon represents how frequently new forecasts need to be generated. The evaluation period usually corresponds to how frequently the model is expected to be retrained.
If you don’t plan to retrain the model as frequently as the forecast horizon, you can now specify an evaluation period longer than the horizon.
For example, let's say you develop a seven-day sales forecast that generates the predicted sales for the upcoming week, but you know that you won't assess and retrain the model more frequently than once a month. In this case, you could specify an evaluation period that is four horizons long in order to capture and evaluate model performance metrics for the entire month across multiple forecast cycles.
For ML Engineers & Operators
Evaluation-Ready Event Logs
The Dataiku event server is a built-in way for teams to capture the prediction logs from all models across Dataiku API nodes and store them in a single location. Prediction logs are especially used to monitor model performance using evaluate recipes and model evaluation stores.
With the Dataiku 11.3 update, the evaluate recipe is now able to automatically process those prediction logs. This removes the need for a preceding prepare recipe and simplifies the setup of production model monitoring.
Prediction Drift Detection Without Ground Truth
When a model is deployed into production, it’s prudent to immediately begin monitoring to detect meaningful shifts in its behavior. However, for many use cases, the ground truth is either unavailable or is not available fast enough to provide timely feedback and model performance metrics that alert operators of a degrading model.
Even in the absence of ground truth labels, you can still take advantage of prediction drift analysis in Dataiku’s model evaluation store with just the prediction logs. The fugacity table and density chart show the differences between the current prediction distribution and the reference distribution from when the model was trained.
Together with input data drift analysis, which also does not require ground truth, these two tools provide critical information and early warning for ML operators so they can proactively keep live models healthy.
Try it Out for Yourself!
Dataiku 11.3 is available for download or upgrade today, including all of these features and improvements that were developed with users like you in mind. Stay tuned for future product updates as we progress toward Dataiku 12, our next major version!
To get the full details about Dataiku 11.3, check out the full release notes .