Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Multiple flows per project

It would be really helpful to be able to split large project into multiple flows. Currently, combining filters with custom tags sort of enables this, but users need to remember to add the appropriate tag to every dataset and recipe as it's created. It'd be nice if from a project page, I could create multiple flows that separate different aspects of a project. Datasets from neighboring flows could still be accessed in the same way datasets can currently be shared across projects. This would enable a few nice use-cases:

  • Logically segment long flows into multiple stages for abstraction
    • Hide preprocessing steps like joining a star schema into a single table to keep the flow from getting too tall
  • Manage multiple independent but related flows separately
  • Easily copy an entire flow into multiple versions without needing to manage git branches
  • Manage ETL flows separately from data preparation flows
  • Reduce load times for flows - many of my projects take more than 60 seconds to load the flow page
  • Easily keep track of which datasets apply to each aspect of a project

Optimally, both the flow view and dataset view could be segmented this way.


One other use-case I forgot to mention:

It's common that there are multiple ways to achieve the same result, sometimes by building data pipelines against different data sources that contain the same data. For example, I might build one flow against views from a data mart while I might build another flow against the underlying tables, and then want to compare results. This creates a really messy flow where it's difficult to organize each approach separately. With multiple flows, I could build several versions of the same logical segment of my overall pipeline, compare them, and choose the one I like best to integrate into my larger pipeline, or switch between them when needed (e.g., if a critical application loads data from a data warehouse, but is optionally allowed to load data from the source operational database in the case that the data warehouse is down, I could configure a script to automatically switch flows when the database goes down without cluttering my overall flow).

Dataiker Alumni

@natejgardner just curious would Flow Zones also satisfy this idea or am I misreading your idea?

For reference: Improve visualization of large flows 

I think if flow zones were able to support nested zones and had a list-based UI, they'd cover this need.

Dataiker Alumni
Status changed to: Duplicate

Upon further investigation we feel that this idea has been sufficiently covered with the launch of Flow Zones in version 8 of DSS.

Dataiker Alumni
Status changed to: Delivered

Duplicate status was set in error.

We did it! Your idea became a reality with the release of Flow Zones in DSS 8. Thanks again for your idea and please let us know if you have any questions.