Deploy Anywhere, New Databricks Integrations, and Other Exciting Updates in Dataiku
In October, we announced a crop of Generative AI capabilities to help you efficiently develop and deliver a variety of LLM-powered data applications, all securely backed by the LLM Mesh. This month, in honor of the Thanksgiving holiday that just passed in the US, we’re serving up a feast of other fresh product features and enhancements for your enjoyment!
Although the added MLOps capabilities and integrations with our tech ecosystem are surely the centerpiece of the banquet, be sure to read to the end to learn about new chart types and filters, improvements to ML tasks and Dataiku Govern, and multiple time-savers for visual designers. In addition to the 10 features highlighted in this article, I’ll also be making an exciting announcement later this week about some brand new Gen AI-powered assistants and additions to the LLM Mesh, so stay tuned! Until then, here’s a cheat sheet of what’s on the menu:
Top 10 Features in the December Update
Deploy Anywhere: Deploy API services developed in Dataiku on AWS Sagemaker, Azure ML, and Google Vertex platforms
New Databricks integrations: Surface Databricks model endpoints as external models in Dataiku, or directly import models from a Databricks Model Registry or unity Catalog
Flow: Insert a recipe into an existing pipeline
Visualization: Two new chart types, cross filtering in dashboards, and customizable reference lines with dynamic value
Model overrides: avoid unstable or invalid predictions with “decline to predict” option
Statistics: Export more types of statistical tests as recipes in your Flow
NLP: Import pre-labels for managed text labeling and an updated NER plugin
Education and enablement: New tutorials and courses for common tasks, Excel users, Responsible AI, Generative AI, and webapp development
Dataiku solutions: New downloadable solutions for Credit Risk Stress Testing and Predictive Maintenance use cases
Miscellaneous user experience improvements to dataset creation and editing, Visual ML, Dataiku Govern, and more
Deploy Anywhere ⇄ External Models: Have It Your Way
In the multi-platform data science landscape that is the reality for many of you, it’s desirable to have the flexibility to develop a model in one place but deploy in another. The “deploy anywhere” capability allows teams to deploy an API service designed in Dataiku to other production environments besides Dataiku API nodes — namely AWS SageMaker, Azure ML, and Google Vertex.
To achieve this latest feat, Dataiku extended the capabilities of its API Deployer. First, simply connect to and configure the cloud infrastructure associated with your preferred cloud ML solution. From there, it’s as simple as creating (or reusing) an API service in your project via the usual methods and pushing it to this cloud infrastructure via the familiar API Deployer.
Deploy models developed in Dataiku to AWS SageMaker, Azure ML, or Google Vertex
Deploy anywhere provides a complementary counterpart to the “external models” capability added in September, which allows you to observe, explain, compare, score with, and govern models deployed with these same cloud ML providers from inside Dataiku. In short, Dataiku aims to remain open and infrastructure-agnostic while still acting as the central platform where teams monitor, govern, and democratize access to all their models, regardless of which platform they are designed or deployed on.
Interoperability and openness between Dataiku and other ML platforms in the MLOps lifecycle
Dataiku & Databricks: a Dynamic Duo
Using Databricks Connect and the dedicated Databricks connection, coders using Dataiku already can seamlessly push down the execution of PySpark code recipes or notebooks to Databricks clusters. With this latest update, you can also now:
Surface models from Databricks as “external models” in Dataiku
Import MLflow models directly from a Databricks model registry or Unity Catalog
Surface Databricks External Models
Using the same external models functionality mentioned briefly above, easily surface an API endpoint deployed in Databricks as an external model object in Dataiku. Once exposed to Dataiku, take advantage of interactive model explainability reports, run performance comparisons against models from multiple origins, apply AI governance protocols, and perform simple scoring against new data using either visual or programmatic tools.
Import MLFlow Models From a Databricks Model Registry or Unity Catalog
With a new graphical interface designed specifically for importing custom MLflow models directly from Databricks, it’s easier than ever to fetch a model directly from a Databricks server. While the original model lineage is preserved (e.g., Databricks model name, version, and source), importing custom MLflow models into Dataiku means you can benefit from all the added value native Dataiku models offer, such as model explainability, monitoring, and AI governance.
Other Notable Enhancements and Features
Improve your speed to value, enjoy an improved user experience, and learn new skills with these additional product updates:
Insert a Recipe in an Existing Flow
To insert a new visual or code recipe into an existing pipeline, rather than build a new branch and manually attach the new dataset as input to the downstream Flow, save time by using the “Insert recipe after this dataset” action..
Enhance Dataiku Dashboards With New Chart Types, More Business Context, and Better Interactivity
Use sankey diagrams to clearly visualize resource flows or process paths and the new scatter multi-pair plot to show the relationships between the values of multiple variables. For better business context, apply a reference line to charts that’s based on a measure such as an average or some other type of custom aggregation.
Cross-filtering in dashboards means you can select a portion of a chart and automatically a filter for this dimension is applied to all other charts on that slide.
Scatter multi-pair plot
Model Overrides Option: Decline to Predict
Model overrides give teams more control over model responses by ensuring predictions remain compliant with your regulatory frameworks, business standards, or ethical guidelines. However, under certain conditions (e.g., high model uncertainty score, large confidence interval), you might use the new overrides option to ‘decline to predict’ altogether for these cases.
Export Statistical Test as Recipes
Joining the ranks of the PCA recipe, now you can conduct about a dozen different statistical tests (one-sample, two-sample, and pairwise Student t-tests, one-way ANOVA, chi-square independence tests, etc.) in the interactive statistics tab and publish the results as a dataset in your Flow, complete with a reusable recipe for operationalization and automation purposes.
Import pre-labels for text labeling/validation & an upgraded NER recipe
Whether pre-labels come from a previous labeling project, a pre-trained model (such as the newer Ontonotes Fast models we’ve upgraded the NER plugin with), or an LLM, import existing labels in a managed text labeling task to speed up the annotation or validation process.
Brand New tutorials in the Dataiku Academy and Developers Guide
New task-based quickstart tutorials for getting started with Dataiku (data prep, ML, MLOps, collaboration, & Excel to Dataiku)
Download new pre-built business solutions for Credit Risk Stress Testing and Predictive Maintenance use cases, plus check out upgrades to several existing solutions like Process Mining, Omnichannel Marketing, and Credit Card Fraud Detection.
Additional User Experience Improvements to:
Dataset creation from files in a managed folder
Visual if-then-else rules
Causal predictions (ML diagnostics & new weighting metrics)
AutoML: Chart model's training and test metrics across various training data size
Dataiku Govern (more monitoring, alert, and subscription options, embedded dashboards, blueprint template migration, etc.)
Want to learn more about Dataiku 12.4?
As always, visit the official release notes to get more details and reference documentation on these product enhancements. Give these new features a try in your own Dataiku projects, and be sure to let us know what you think in the comments!