April Release Notes: Data Quality, LLM Mesh Additions, and More

Michael Grayson
Michael Grayson Administrator, Dataiker, Alpha Tester, Dataiku DSS Core Designer, Community Team Posts: 298 Administrator
edited June 27 in What's New

We are delighted to let you know that we’ve just announced our latest release, Dataiku 12.6 and it is now available for you to download for your production environments.

For a quick overview of the new features and improvements, we encourage you to watch our first-ever edition of What’s New, Dataiku! This six-minute video walks through all the major updates in Dataiku V12.6 with brief descriptions and screenshots of the features in action!

//play.vidyard.com/bKkbsSuAJWYmoyRfZVPhrJ.html?

This release brings a brand-new approach to data quality, overhauled dashboards, new LLM models, and a suite of enhancements to make your Dataiku user experience even better. Let’s dive right in and start with the new way to manage your data quality.

Data Quality Improvements:

We had a lot of feature requests to improve the old checks system, especially on the topic of increasing visibility and centralizing the system — so that

’s just what we have done. This release sees an overhaul of data quality features for datasets in Dataiku, replacing the “checks” with “data quality rules”.

These rules are much easier to configure than checks, expand the number of rules available to users, and add support for multi-column rules. No longer are you required to create a metric and apply a check to it -- data quality rules can be directly and quickly created right from the Edit Data Quality Rules screen:

MichaelG_0-1712314450038.png

You can now at a glance assess data quality at both project and instance levels, while of course still being able to get a more detailed view with dataset-level monitoring.

MichaelG_1-1712314458440.png

So, a snapshot is great, but we need to monitor data quality changes over time. The timeline view has you covered here, displaying the status of each of your rules on a daily basis. Rule History allows you to then dive in deeper, seeing the history of specific rules and how their results have changed over time.

MichaelG_2-1712314458460.png

Finally, we have also enhanced both the Data Catalog and Flow to display data quality rules about the individual datasets selected, allowing you to quickly get a view of issues present from all the places you already interact with your data.

New Visual Recipe: Generate Statistics

In recent releases we enabled you to export the configuration selected statistics cards as recipes in order to operationalize the results in your Flow. To go one step further, we have now made it even easier to integrate statistics in pipelines by introducing a visual recipe for generating statistics.

MichaelG_3-1712314494783.png

Now you can do univariate analysis, principal component analysis, and various statistical tests for both numeric and categorical variables. This makes it easier than ever to use statistical outputs as features for downstream modeling, visualization, or business analysis

MichaelG_4-1712314494801.png

Dashboard Improvements:

Dashboards have also seen quite a few tweaks and fixes - check out the release notes for details, with a summary below.

  • Min. Max aggregation for alphanumeric and date variables in charts for more versatile data representation.
  • Introduction of a filter panel for dashboards, allowing you to quickly see applied filters.
  • Improved layout flexibility by controlling the position of the filter panel (top, right, left, tile view) and manually order the columns by drag and drop
  • Ensured consistent UX for date filters across various screens, ensuring a more standardized experience.
  • Overall visual improvement and enhancement of dashboard page navigation for a better experience.

MichaelG_5-1712314701749.gif

AI Assistance Enhancements:

We have leveled up our AI Explain feature - which can now tackle flows with multiple zones.

With all the same customizability as before, this feature has now been expanded to help you quickly understand even more complex flows. Use this tool to help you with the sometimes arduous process of keeping good helpful descriptions of your projects up to date - or simply getting an overview of a new project you have been added to.

This tool only shares project metadata as well as dataset schemas and descriptions — the contents of your datasets are not shared.

To get the rundown of this feature check out the 3 min video on the subject!

LLM Mesh updates:

This release has seen a raft of LLM Mesh enhancements - the full list of which you can read here. To spotlight a few:

  • Added support for Claude 3 models in the Anthropic connection as well as Mixtral-8x7B on HuggingFace local connection
  • Added proxy support to the Databricks Mosaic AI connection.
  • Removed support for MosaicML connections (MosaicML Inference was retired on February 29, 2024), you should now use Databricks Mosaic AI connections instead.



Other UX Improvements:

  • Get integration for DSS Python API, allowing users to work with notebooks and perform other actions programmatically.
  • Custom filters in Dataiku Govern for faster project and project element discovery.



Additional Features and Solutions:

  • Send Mail Plugin improvements, leveraging the Dataiku messaging channel for easier and stylized data set sharing.
  • Improved Metadata filtering - applying multiple filters at once as well as conditional filtering should help you find what you are looking for much more quickly.

MichaelG_0-1712315382966.png

  • We have added a new model override conditions for regression models: uncertainty estimation, allowing users to compute prediction intervals for regression models. You can then use this to set a certain coverage threshold which triggers a model override.
  • Users can now access open-source large language models through their Databricks connection for tasks like text completion and text embedding.
  • New additions and upgrades to Dataiku Solutions, including Parameters Analyzer and Clinical Site Intelligence for specific use cases.

Cloud Upgrade Schedule

Dataiku 12.6 will be available to those on new cloud spaces on April 5th, and existing spaces will be upgraded over the next two weeks.



Want to learn more about Dataiku 12.6? For more information, visit the official release notes to get more details and reference documentation on these product enhancements.

READ THE RELEASE NOTES

Setup Info
    Tags
      Help me…