Dataiku 12: A Dozen Ways Work Just Got Better

ChristinaH Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 15 Dataiker

While you’ve been enjoying the spring weather outside, our product and engineering teams have been busy in the kitchen cooking up tasty new innovations in Dataiku 12. In honor of our twelfth product version, we’re pleased to provide a rundown of this major release's 12 most noteworthy features. After all, so many good things come by the dozen: eggs, cookies, cupcakes, doughnuts (hmm, maybe I need a li’l snack?). Read on to learn more, and stay til the end for a sneak preview of the latest and greatest ways we’re making it easier for you to take advantage of GPT and large language models in your Dataiku projects!

Increase Transparency

Help everyone understand AI projects and trust outputs

Standardize Components

Ensure success with best practices and approved components

Centralize Operations

Deliver projects with consistent deployment and management

1. Auto Feature Generation

Why spend days painstakingly joining, transforming, and aggregating input datasets to create new features for modeling when you can use the new generate features visual recipe? Simply assign relationships between a primary dataset and enrichment datasets, choose relevant columns and the types of transformations to perform on them, and let Dataiku do the heavy lifting on feature engineering.


2. Universal Feature Importance

Three new Shapley-based visualizations for absolute feature importance, feature effects, and feature dependence enable you to inspect and explain both the magnitude and relative direction of a feature’s impact on a model’s predictions. Even better, these built-in analyses are model-agnostic, so you can compare and contrast feature importance across different types of algorithms as well as models developed outside of Dataiku’s visual ML framework.


3. Uplift Modeling (Causal ML)

It’s true that correlation does not imply causation. But with a dedicated AutoML task for uplift modeling, you can now use a causal ML approach to measure cause-and-effect relationships. By calculating the incremental impact of a treatment, such as a direct marketing action, you can prioritize the most “influenceable” cases for the outcome or behavior you want. Use uplift modeling not just for retail and marketing use cases, but also fundraising, medical treatment and clinical trials, human resources programming, or even political campaigns.


4. Forecasting Enhancements, Including Prophet Support

This product update brings many enhancements for time series tasks in Dataiku. For example, you can now easily access the well-known Prophet forecasting procedure as a built-in algorithm. When configuring the design of your forecasting experiments, take advantage of a new grouped K-fold splitting option with the ability to enforce rows with the same value for the group column to be assigned to only one fold. Finally, you’ll also be able to export the predicted dataset directly from the Lab.


5. Request Center Workflows for Plugin Installation and Code Env Creation

Have you ever been working on a project and realized you needed a specific plugin from the plugin store or a custom code env with a specific package (or set of packages), but don’t have the privileges to install or create it yourself? Dataiku 12 delivers pre-defined workflows for these specific scenarios, so that administrators are notified in their request center and can take action to support you. After the admin user processes the request, you’ll receive a notification in your own inbox, and can continue on your way!


6. Help Center

While working in Dataiku, have you ever wished for a personal coach whose only job is to help you efficiently find answers and improve your knowledge of the technology or tooling at the precise moments you need help? I know I have! While we don’t have a GPT developed for that (yet!), I would like to introduce you to the help center, your new best friend and Dataiku trail guide.

The help center centralizes a wide variety of useful resources for both technical support and educational purposes, providing contextual, personalized content recommendations at the exact moment you need them. With search capabilities across all Dataiku resources and the convenience of reading reference documentation and knowledge base articles in place, the help center means you can get guidance and inspiration faster, without ever leaving the product.

7. Data Catalog With Data Collections

It’s no secret that finding and accessing the right data for analytics or prediction projects can be a frustrating and time-consuming process. Data collections in Dataiku allow you to create curated lists of key datasets by team or use case, so everyone can easily find and share quality datasets for their projects. Quickly access datasets already in Dataiku or search for external tables in your organization’s indexed connections, and review details about data freshness, schema, size, lineage, and the assigned data steward.

8. New Dataiku Solutions

If it’s been a while since you’ve browsed the catalog of prebuilt Dataiku solutions for common industry and horizontal use cases, I encourage you to visit the newly revamped webpage to see what's been delivered in the last six months. Off-the-shelf templates and plug-and-play applications for process mining, product recommendations, financial forecasting, and pharamacovigilance are just a few of the latest assets you can download to solve key business challenges in your industry.


9. Model Overrides

Model overrides in Dataiku add a human layer of control via business rules and ensure safe predictions by enforcing model outcomes for known cases or conditions. Input override rules and constraints during model design; when evaluating model results, examine which rules were triggered most frequently. Built-in charts and the scored outputs help you assess what proportion of predictions fell inside the guardrails naturally versus required intervention to meet the specified conditions.

10. Schema Management and Flow Build Improvements

Real-world pipelines continually evolve to accommodate refreshed data streams, new and changed column names and types, and different data preparation and transformation strategies. To make dataOps in Dataiku more intuitive, this latest version includes several updates to building datasets and propagating schemas, including:

  • Run subsequent recipes for downstream builds with on-the-fly schema propagation.
  • Recursive downstream build default option for datasets.
  • Build Flow Zone builds all final datasets of a zone and, by default, stops recursing at the start of the Flow Zone.
  • Engine selection and run recipe enhancements.
  • Default “smart mode” means new input columns will automatically propagate to the output dataset for all join recipes and the Top N recipe.
  • Improvements to interactive schema propagation.


11. MLOps Enhancements

In our continuing effort to simplify, streamline, and centralize the myriad processes related to deploying and monitoring data products, this major version will deliver many enhancements to MLOps. In the first dot release (Dataiku 12.0), operators will get a new set of automated drift metrics in the model evaluation store and be able to set up a monitoring feedback loop in a matter of clicks, including visualization of API deployments within the projects themselves. In future minor releases across the next several months, we plan to extend MLOps significantly, with better interoperability between Dataiku and Cloud ML services for both model deployment and observability.


12. New Governance Views

Organizations taking advantage of Dataiku’s Govern module will appreciate new visualizations to assess the entire AI portfolio at a glance, as well as usability improvements across many Govern screens. For example, the Kanban view breaks out all governed projects by lifecycle stage, grouped and colored by business initiative. The improved Matrix view, meanwhile, gives additional flexibility to compare projects across various dimensions (not just risk and value).


A Bonus Treat to Make a Baker’s Dozen

We recently updated our Natural Language Generation plugin to incorporate the latest GPT models and republished it as the OpenAI GPT plugin. This updated component uses the expressive power of generative AI to enable users to perform pre-defined or customized NLP tasks.

With four dedicated visual recipes for text generation tasks, Dataiku makes GPT accessible to non-coders in a scalable yet transparent way. Teams can apply generative AI in bulk to full datasets (versus feeding individual queries into the ChatGPT sandbox application) while preserving both the query and outputs as part of the project Flow. Moreover, Dataiku’s plugin also adds several additional benefits versus using the API directly.


Give GPT and LLMs a try in your own Dataiku projects, and let us know what you think!

Our Special Ingredient — Users Like You

As always, I’d like to give a shout-out to the product and engineering teams for their innovation and dedication to improving our product, as well as thanks to our amazing Dataiku user community. Your ideas and feedback are much appreciated, providing us with plenty of food for thought and directly contributing to the development of Dataiku’s product.

Here are just a few examples of delivered solutions stemming from feature requests or feedback from users like you:

Try Dataiku 12 Out for Yourself!

We hope that you’re excited as we are about this new release. Enjoy the latest additions to your data science toolkit, and let us know in the comments which new feature you’ll be adding into the mix when whipping up your next project!

Of course, a major Dataiku release is not complete without giving our users the tools to learn and try out the new features for themselves. If you’re looking for a perfect recipe to follow, a new Crash Course in Dataiku 12 is available on the Dataiku Academy, with videos, how-to articles, and hands-on exercises.

Want to learn more? Check out the full Dataiku 12 release notes, and click the button below to watch a 30-minute presentation that dives into the vision behind our latest (and tastiest) major version.


Setup Info
      Help me…