The Flow is at the heart of all Dataiku projects. By default, your Flow starts off empty and populates as you connect to data, use recipes, and build models. Ultimately, its consistent visual grammar enables users to comprehend the entire end-to-end data pipeline at a glance.
As your projects grow in scope and complexity, it can be challenging to know what exactly has happened and when if you’re using the default View. Luckily, Dataiku provides several options to help you review your Flow through different lenses and keep things organized.
Continue reading to discover these Views and how they can improve your workflow. Before we start, please note that all of the options we will explore in this blog post are found in the bottom left-hand corner of your Flow under View.
LAST BUILD is a View option that enables you to quickly see the time of each dataset’s last build regardless of whether it was successful or not. You can use this View to promptly check which parts of the flow may be out of date or at risk for inconsistencies. This can happen if someone has altered and rebuilt something but hasn’t propagated the schema or rebuilt the downstream recipes.
To drill in further, you can find the detailed information powering this View by selecting an individual recipe or dataset and checking the right-hand panel (i) Information section. This allows users to gain confidence that the data they base their important business decisions on is up to date.
Now you know when the datasets were built — but what about when changes were last made and by whom? The LAST MODIFICATION View provides valuable information about your co-contributors and the timeline of changes. This View is especially great for new users onboarding to the project as well as those working in remote teams who need context.
Last Build Duration
After you’ve used the other Views to get a clear understanding of how the project was set up, you’ll want to check out the LAST BUILD DURATION View. This View plots the time it took to compute each recipe in your Flow on a linear or logarithmic scale. New in Dataiku 10, this View allows you to immediately identify bottlenecks and determine opportunities for pipeline optimization.
With this additional information, you can intelligently decide to change the recipe engine, select a fast path between data sources, or refactor the Flow in some way!
This information is especially useful when you have automated your machine learning and analytics pipelines inside of Dataiku. You’re probably familiar with Scenarios for this purpose, but did you know that you can check the log of all scenario runs for a given project using the internal stats dataset?
Armed with this dataset, you can understand if the changes you have made to speed up the operation of one or more recipes result in tangible differences for the end consumer of your Flow.
Before we wrap things up, it’s worth noting a couple of other new Flow features you might like. As of Dataiku 10, you can now quickly create Uploaded Datasets by dragging and dropping files directly on the Flow! And, in case you missed it, In Dataiku 9.0, we enabled the ability to zoom, pan, and reset the zoom from your keyboard and/or dedicated shortcut keys for easy navigation.
Finally, if you’re looking to spice up your Dataiku workspace with an Easter egg, you can type DISCO anywhere with your Dataiku browser open (except for text input fields like the search bar, etc.) and watch as the datasets change color!
Using Views To Improve Your Dataiku Workflow
Knowing where your data comes from, when it was last updated, and by whom are critical factors to understanding, explaining, and governing your work. By applying what was shown above, you’ll be able to get more out of the projects that you and your team are building in Dataiku!
To discover even more ways you can customize and visualize your project’s Flow, check out the Dataiku course Flow Views and Actions on the Dataiku Academy: