December Release Notes: Deploy Anywhere, New Databricks Integrations, and Much More

ChristinaH
ChristinaH Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 15 Dataiker

Deploy Anywhere, New Databricks Integrations, and Other Exciting Updates in Dataiku

In October, we announced a crop of Generative AI capabilities to help you efficiently develop and deliver a variety of LLM-powered data applications, all securely backed by the LLM Mesh. This month, in honor of the Thanksgiving holiday that just passed in the US, we’re serving up a feast of other fresh product features and enhancements for your enjoyment!

Although the added MLOps capabilities and integrations with our tech ecosystem are surely the centerpiece of the banquet, be sure to read to the end to learn about new chart types and filters, improvements to ML tasks and Dataiku Govern, and multiple time-savers for visual designers. In addition to the 10 features highlighted in this article, I’ll also be making an exciting announcement later this week about some brand new Gen AI-powered assistants and additions to the LLM Mesh, so stay tuned! Until then, here’s a cheat sheet of what’s on the menu:

Top 10 Features in the December Update

  • Deploy Anywhere: Deploy API services developed in Dataiku on AWS Sagemaker, Azure ML, and Google Vertex platforms
  • New Databricks integrations: Surface Databricks model endpoints as external models in Dataiku, or directly import models from a Databricks Model Registry or unity Catalog
  • Flow: Insert a recipe into an existing pipeline
  • Visualization: Two new chart types, cross filtering in dashboards, and customizable reference lines with dynamic value
  • Model overrides: avoid unstable or invalid predictions with “decline to predict” option
  • Statistics: Export more types of statistical tests as recipes in your Flow
  • NLP: Import pre-labels for managed text labeling and an updated NER plugin
  • Education and enablement: New tutorials and courses for common tasks, Excel users, Responsible AI, Generative AI, and webapp development
  • Dataiku solutions: New downloadable solutions for Credit Risk Stress Testing and Predictive Maintenance use cases
  • Miscellaneous user experience improvements to dataset creation and editing, Visual ML, Dataiku Govern, and more

  1. Deploy Anywhere ⇄ External Models: Have It Your Way

In the multi-platform data science landscape that is the reality for many of you, it’s desirable to have the flexibility to develop a model in one place but deploy in another. The “deploy anywhere” capability allows teams to deploy an API service designed in Dataiku to other production environments besides Dataiku API nodes — namely AWS SageMaker, Azure ML, and Google Vertex.

To achieve this latest feat, Dataiku extended the capabilities of its API Deployer. First, simply connect to and configure the cloud infrastructure associated with your preferred cloud ML solution. From there, it’s as simple as creating (or reusing) an API service in your project via the usual methods and pushing it to this cloud infrastructure via the familiar API Deployer.

//play.vidyard.com/b4Y6F2fUknvCkSrowig8qk.html?

Deploy models developed in Dataiku to AWS SageMaker, Azure ML, or Google Vertex

Deploy anywhere provides a complementary counterpart to the “external models” capability added in September, which allows you to observe, explain, compare, score with, and govern models deployed with these same cloud ML providers from inside Dataiku. In short, Dataiku aims to remain open and infrastructure-agnostic while still acting as the central platform where teams monitor, govern, and democratize access to all their models, regardless of which platform they are designed or deployed on.

MichaelG_0-1701855778696.png

Interoperability and openness between Dataiku and other ML platforms in the MLOps lifecycle





  1. Dataiku & Databricks: a Dynamic Duo

Using Databricks Connect and the dedicated Databricks connection, coders using Dataiku already can seamlessly push down the execution of PySpark code recipes or notebooks to Databricks clusters. With this latest update, you can also now:

  1. Surface models from Databricks as “external models” in Dataiku
  2. Import MLflow models directly from a Databricks model registry or Unity Catalog

MichaelG_1-1701855778629.png

Surface Databricks External Models

Using the same external models functionality mentioned briefly above, easily surface an API endpoint deployed in Databricks as an external model object in Dataiku. Once exposed to Dataiku, take advantage of interactive model explainability reports, run performance comparisons against models from multiple origins, apply AI governance protocols, and perform simple scoring against new data using either visual or programmatic tools.

Import MLFlow Models From a Databricks Model Registry or Unity Catalog

With a new graphical interface designed specifically for importing custom MLflow models directly from Databricks, it’s easier than ever to fetch a model directly from a Databricks server. While the original model lineage is preserved (e.g., Databricks model name, version, and source), importing custom MLflow models into Dataiku means you can benefit from all the added value native Dataiku models offer, such as model explainability, monitoring, and AI governance.

Other Notable Enhancements and Features

Improve your speed to value, enjoy an improved user experience, and learn new skills with these additional product updates:

  1. Insert a Recipe in an Existing Flow

To insert a new visual or code recipe into an existing pipeline, rather than build a new branch and manually attach the new dataset as input to the downstream Flow, save time by using the “Insert recipe after this dataset” action..

ChristinaH_0-1701856207076.gif

  1. Enhance Dataiku Dashboards With New Chart Types, More Business Context, and Better Interactivity

Use sankey diagrams to clearly visualize resource flows or process paths and the new scatter multi-pair plot to show the relationships between the values of multiple variables. For better business context, apply a reference line to charts that’s based on a measure such as an average or some other type of custom aggregation.

Cross-filtering in dashboards means you can select a portion of a chart and automatically a filter for this dimension is applied to all other charts on that slide.

ChristinaH_1-1701856228176.png

Sankey diagram

ChristinaH_2-1701856234937.png

Scatter multi-pair plot

  1. Model Overrides Option: Decline to Predict

Model overrides give teams more control over model responses by ensuring predictions remain compliant with your regulatory frameworks, business standards, or ethical guidelines. However, under certain conditions (e.g., high model uncertainty score, large confidence interval), you might use the new overrides option to ‘decline to predict’ altogether for these cases.

  1. Export Statistical Test as Recipes

Joining the ranks of the PCA recipe, now you can conduct about a dozen different statistical tests (one-sample, two-sample, and pairwise Student t-tests, one-way ANOVA, chi-square independence tests, etc.) in the interactive statistics tab and publish the results as a dataset in your Flow, complete with a reusable recipe for operationalization and automation purposes.

ChristinaH_3-1701856390760.png

  1. Import pre-labels for text labeling/validation & an upgraded NER recipe

Whether pre-labels come from a previous labeling project, a pre-trained model (such as the newer Ontonotes Fast models we’ve upgraded the NER plugin with), or an LLM, import existing labels in a managed text labeling task to speed up the annotation or validation process.

  1. Brand New tutorials in the Dataiku Academy and Developers Guide

ChristinaH_4-1701856403682.png

  1. Dataiku Solutions

Download new pre-built business solutions for Credit Risk Stress Testing and Predictive Maintenance use cases, plus check out upgrades to several existing solutions like Process Mining, Omnichannel Marketing, and Credit Card Fraud Detection.



Additional User Experience Improvements to:

  • Dataset creation from files in a managed folder
  • Editable datasets
  • Visual if-then-else rules
  • Causal predictions (ML diagnostics & new weighting metrics)
  • AutoML: Chart model's training and test metrics across various training data size
  • Dataiku Govern (more monitoring, alert, and subscription options, embedded dashboards, blueprint template migration, etc.)



Want to learn more about Dataiku 12.4?

As always, visit the official release notes to get more details and reference documentation on these product enhancements. Give these new features a try in your own Dataiku projects, and be sure to let us know what you think in the comments!

READ THE RELEASE NOTES

Setup Info
    Tags
      Help me…