Peek Under the Hood of Dataiku 12.5

LynnH
Dataiker
7 min read 6 4 109K


New year, new Dataiku release! We’re coming in hot for 2024 with version 12.5, which includes (but of course!) Generative AI advancements, the groundbreaking Unified Monitoring feature, and some handy upgrades to Code Studios.

But that's not all — in this post, we'll also delve into the smaller yet impactful features that enhance governance, streamline workflows, and elevate your overall Dataiku experience. From conditional formatting to SQL Recipes and Notebooks upgrades, Dataiku 12.5 caters to the diverse needs of data enthusiasts, ensuring that every detail matters. So, fasten your seatbelts as we embark on a journey through the big-ticket features, the cool additions, and the smaller (but just as awesome) elements that define Dataiku 12.5.

 

The Big Stuff

There are a few big-ticket additions in Dataiku 12.5 to be aware of, both on the Generative AI front as well as for central governance and for those who are heavy users of Code Studios:

 

Generative AI Platform Capabilities

We have a big bundle of brand new Generative AI platform capabilities for you to augment the LLM Mesh, builder tooling, and AI assistants. We’ll be writing about these more in depth on The Dataiku Blog very soon, so keep your eyes peeled for the nitty gritty details there, and we’ll come back and add the link here once it’s been posted.

In the meantime, here are the highlights:

  1. For starters, we’ve added a bunch of Gen AI tutorials to the developer guide. So whether you’ve been enjoying the developer guide already or haven’t gotten started, make sure to head over and check it out.
  2. Some of the hottest models today, Mistral 7B and Zephyr 7B, are now included in the Hugging Face connection.
  3. Dataiku 12.5 includes support for locally running embedding models, so now you can do a full RAG workflow completely locally — critical for sensitive documents you might not want to send through a commercial embeddings model API. Plus, you now have the ability to register embedding models in custom LLM connections for added flexibility to use a different embeddings model than those we provide out of the box.
  4. We’ve also introduced advanced mode and "show prompt" in prompt studios and LLM recipes for full visibility as to what's being sent to the model. This means better control plus personal upskilling — a win-win.

Elevate Oversight With Unified Monitoring

The introduction of Unified Monitoring in Dataiku 12.5 stands out as a game changer for businesses seeking a centralized approach to model and project oversight. Unified Monitoring is a culmination of Dataiku's continued commitment to providing users with universal operations, or a singular platform to monitor all deployed models and projects regardless of the external systems they operate on.

Starting with version 12.2, Dataiku enabled users to seamlessly incorporate models deployed on external systems like Databricks, SageMaker, Azure ML, and Google Vertex into Dataiku projects. The subsequent release of Deploy Anywhere in version 12.4 further expanded these capabilities in the opposite direction, allowing users to deploy models developed in Dataiku to the major cloud platforms.

Now, with Unified Monitoring in 12.5, enjoy a comprehensive view of all deployed projects and API services on one screen, offering unparalleled oversight on the deployment stage, infrastructure, and status of all your data products in production. Whether your models are served on Dataiku architecture, Amazon SageMaker, Databricks, Azure ML or any other supported system, you can efficiently manage and monitor them from a single interface.

LynnH_0-1706019168677.jpeg

LynnH_1-1706019181316.png

We’re excited about Unified Monitoring in Dataiku 12.5 — it’s a groundbreaking feature that consolidates governance, simplifies management, and provides a holistic view of deployed pipelines and models, bringing a new level of efficiency and control.


Unlock New Possibilities With New Code Studios Blocks for Gradio & Voilà

Dataiku 12.5 comes with compelling upgrades to Code Studios (first released, as you might remember, in Dataiku 11). We’re introducing seamless integration with Gradio, catering to developers and data scientists who have longed for a way to bring Gradio's functionality into the Dataiku environment, particularly for Gen AI applications. Gradio makes it easy to create visual input forms to query models and visualize model results in the same place.

But Gradio integration isn't the only star of the show. Enter: Voilà, my favorite French expression and a feature tailored for Jupyter Notebook enthusiasts. For those familiar with dynamic testing within Jupyter Notebooks, Voilà strips away input cells and transforms your content into an interactive dashboard. Whether you're a seasoned Jupyter Notebook user or a developer, the addition of Voilà in Code Studios offers a dynamic testing playground that promises to elevate your experience and creativity.

With Gradio and Voilà joining the ranks of Streamlit in pre-built Code Studios templates, Dataiku 12.5 invites our code-heavy users to develop application front-ends in new and exciting ways.

 

The Smaller (But Just as Awesome) Stuff

In the realm of data science, every detail matters, and Dataiku 12.5 proves this by introducing a cornucopia of features that may not steal the spotlight, but that add even more depth and versatility to your Dataiku experience:

Conditional Formatting

The addition of conditional formatting in Dataiku 12.5 is a game changer for Excel enthusiasts. We’ve consolidated all the different options for color-coding your dataset into a single user-friendly menu and added an interface to easily apply if-then rules for vibrant datasets. What's more, the colors persist seamlessly when sharing data into dashboards, and there's even a possibility of extending this feature to Excel exports.

 

SQL Recipes & Notebooks Upgrades

In Dataiku 12.5, SQL recipes and notebooks take a cue from their Python counterparts, allowing for more fluid editing. Changes made in one can seamlessly propagate to the other, ensuring a fair and efficient experience for SQL aficionados.

OpenAPI for Classification Predictions

Formerly known as Swagger, OpenAPI is a standardized specification for JSON, ensuring predictable calls to and from an API. Dataiku 12.5 provides the ability to produce the OpenAPI-compatible JSON format for a given endpoint, so that you can register this standardized OpenAPI documentation in your tool of choice.

LynnH_2-1706019271114.png


Elbow Plots for Charts

Our users have been asking, and we’ve answered. Dataiku 12.5 introduces new performance charts for clustering tasks. With silhouette scores and elbow plots, determining the optimal number of clusters should be a breeze!

Compare Column Values

This feature is simple, but mighty. Choose up to four columns to compare side by side, and quickly spot-check relationships between columns row-by-row in your datasets — no more horizontal scrolling or hiding columns! Especially when working with long text fields, it can be cumbersome to individually click to display the compete value of individual cells, and even harder to systematically compare the text with other column values to analyze differences or validate a label. This view is perfect for quick human-in-the-loop validation of LLM-generated outputs.

Dataiku Solutions Evolutions

While not specifically tied to platform releases, don’t forget Markdown Optimization for Retail, the latest Dataiku Solutions release. Also, the full catalog of Dataiku Solutions is now available on Dataiku Cloud, streamlining access and ensuring that the latest features are reflected in these solutions.

One other notable upgrade for admins: Dataiku now offers a link where users can request the install of a certain code environment, plugin, or whatever is needed for a Dataiku Solution to run that's not already on your instance — this sends a request to the admin of the instance, smoothing the import process.

 

Last But Not Least: 12.5 UX Enhancements

As always, in the pursuit of a seamless and intuitive user experience, Dataiku 12.5 introduces a host of miscellaneous UX enhancements that promise to elevate your data journey:

Data Catalog Delight: In the data catalog, users can now filter by data steward, making it easier to quickly navigate to all the datasets you look after across multiple projects or discover datasets managed by specific individuals that might be useful to your projects. 

LynnH_3-1706019323238.png

 

Governance Advancements:

  • Dataiku Govern receives notable upgrades with 12.5, including auditability at the item level. Users can now track changes, identify when they occurred, what the changes were, and who made them. Custom filters offer greater specificity and rule-based filters, akin to Airtable, providing a more tailored governance experience.
  • The update extends to rules and permissions, offering specific rule assignments at the dataset or recipe level. This level of granularity provides enhanced customizability, ensuring governance aligns precisely with your unique Dataiku items.
  • For better visibility of usage context, governed projects using LLM components are given an LLM flag in case you want to track them separately or with increased scrutiny. Tags are also automatically added for projects which are either a Dataiku App template or instance so you can easily determine which is the application master versus simply user instantiations of the app.


Model Views: Do you already know about the fairness report, stress test center, and model error analysis model views in Dataiku’s Visual ML? Dataiku 12.5 improves the discoverability of these powerful analyses by displaying them in the Model Views menu by default, along with an option to request the installation of those plugins not already on your instance.


Column Usage Flow View: This new flow view tracks the usage of column names across your project. While not a complete column lineage feature, it provides valuable insights about how a given feature is utilized in a project.

Dashboard Mastery: Dataiku 12.5 introduces cross filters with the ability to exclude based on those filters, offering more control over dashboard interactions. Customization extends to the structure of charts, providing the flexibility of a dual y-axis that can be configured independently.

 

OK, that’s it! Let us know in the comments what you think of the new features, and of course check out the release notes for more detail and information.

 

 

READ THE RELEASE NOTES

4 Comments
Share: