Dataiku 11 has officially arrived! This major release is jam-packed with powerful new features for technical experts, business leaders, and everyone in between. Keep reading for more details and demos. At the end of the article, you’ll find additional links to a recorded session showcasing the latest capabilities, a Crash Course in Dataiku 11 available through the Dataiku Academy, and other reference documentation.
Key Highlights for Technical Experts
Dataiku 11 delivers new tooling for code-first profiles — such as embedded code studios to access their favorite IDEs from within a project, robust experiment tracking for programmatic ML, and custom ML metrics — so our most technical users can use the development environment and tooling that feels most comfortable to them, while still contributing to shared projects with transparency for other teammates.
Watch a short video on the embedded code studios feature
A central feature store makes it easy for data scientists and data engineers to safely share and reuse high-quality, curated reference datasets across projects, so experts can spend more time innovating and less time on repetitive, mundane tasks like data wrangling and feature engineering.
New No-Code Tools for Analysts and Domain Experts
For citizen data scientists and specialists in the line of business, visual tools for a variety of advanced use cases remove technical barriers, so they can solve domain-specific problems without outside services or needing to know how to code. For example, visual time series capabilities enable analysts to generate statistical analyses* on time series data and develop, deploy, and monitor forecasting models, all in the familiar Dataiku framework. Or, they can take advantage of new geo-routing capabilities to compute travel distance, isochrones, and geo routes to inform high-stakes real estate and logistics decisions.
Native visual ML interface for time series forecasting
Analysts can supercharge manual what-if analysis with outcome optimization to prescribe the best course of action to achieve a given goal. This feature helps teams answer not only the question “what does the model predict given these inputs?”, but also the more important question, “what changes to these inputs would get us to the best possible outcome?”.
Outcome optimization accelerates what-if analysis for prediction tasks
For computer vision use cases, a collaborative managed labeling framework takes the headache out of large-scale image annotation projects, while visual deep learning tasks for object detection and image classification* mean data scientists can easily take advantage of pre-trained models and transfer learning without needing to be experts in deep learning architecture and coding with Keras. Default parameters, data augmentation and visualizations of the image data, and interactive what-if analysis deliver a consistent experience with other Dataiku modeling tasks.
Improved Experiences for Sharing Assets, Conditional Logic, and Visualization
One of the best ways to improve productivity is simply to know what others are working on and be able to easily springboard off of it, rather than start every project from scratch. With new quick sharing mechanisms and access request workflows, teams can strike the right balance between control over project objects and discoverability and reuse by others.
If you’ve ever manually constructed business logic by combining multiple processors or visual recipes, or by nesting many if, then, else conditions in a custom formula step — then you know that these rules can be unwieldy to create, prone to errors, and difficult to visualize and understand the logic clearly. A new “switch case” processor and switch() formula function, along with a visual “Create If, Then, Else Statement” processor, assists with constructing simple to complex conditional logic on one or more columns in the Prepare recipe. Even better, the new rules interface appears in many other visual recipes, improving the rule-building experience and enabling more powerful filtering.
A better experience for recoding values and constructing conditional logic
Multi-dimensional pivot tables provide hierarchical analysis and aggregations for analysts
Automated Documentation and Bundle Governance for Increased Project Oversight
For data and analytics leaders and their Risk counterparts, governance and structured workflows on analytics projects lead to more confidence and trust that AI is being developed and scaled responsibly across the entire portfolio. When combined with the existing model registry, the dedicated bundle registry means organizations have a single repository to oversee both project and model versions for high-stakes initiatives. Model stress tests reduce risk by enabling data scientists and ML operators to simulate real-world deployment conditions and data quality issues to interrogate a model’s behavior and robustness prior to deployment.
For added oversight, the flow document generator provides a quick and easy way to automatically create a detailed snapshot report listing all the information about what's in your project flow. Along with the model document generator, this flow documentation provides valuable information that companies need to document the current state of projects and models for compliance and reproducibility purposes.
Automatic and configurable flow documentation
Don’t Forget to Check Out Even More New Features Released in Dataiku 10.0.6
Finally, if your organization isn’t able to upgrade to Dataiku 11 just yet but you still have access to Dataiku 10, we definitely recommend you check out what was released with Dataiku 10.0.6 (note that 10.0.7 is the latest available version on Dataiku 10). This product update at the end of May delivered another large batch of exciting new features to take your projects to the next level, including:
A brand-new monitoring UI where administrators can monitor all activity on managed clusters and perform cluster native actions.
When uploading multiple files at once, you can now choose between creating a single dataset or one dataset per file.
Native support for time series in Visual Statistics (stationarity tests, trend tests, ACF, PACF, autocorrelation statistics).
API additions such as last login and last activity to users API and new APIs to get information about dataset last build and to manage personal API keys.
Pre-built, visual tasks for image classification and data augmentation for image tasks.
MLOps: the ability to compute data drift in standalone evaluation recipes.
Bulk upload files and choose whether to stack them or create multiple datasets
Try It Out for Yourself!
I hope that you’re as excited as we are about the release of Dataiku 11! The new version is hot off the press, and we look forward to your feedback on all of the latest features and functionalities that were developed with users like you in mind. Want to test it out for yourself? Dataiku Online instances on the latest version will be available soon for new trials, and you can go further by taking the Crash Course in Dataiku 11 available through the Dataiku Academy!
To learn more, check out the full Dataiku 11 release notes or click the button below to watch a 30-minute video that dives deep into the new version.
* Visual statistics cards for time series data, no-code image classification for image classification, and automated data augmentation for object detection and image classification are actually features available in Dataiku 10.0.6.