The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Multiple flows per project

Multiple flows per project

0 Kudos

It would be really helpful to be able to split large project into multiple flows. Currently, combining filters with custom tags sort of enables this, but users need to remember to add the appropriate tag to every dataset and recipe as it's created. It'd be nice if from a project page, I could create multiple flows that separate different aspects of a project. Datasets from neighboring flows could still be accessed in the same way datasets can currently be shared across projects. This would enable a few nice use-cases:

  • Logically segment long flows into multiple stages for abstraction
    • Hide preprocessing steps like joining a star schema into a single table to keep the flow from getting too tall
  • Manage multiple independent but related flows separately
  • Easily copy an entire flow into multiple versions without needing to manage git branches
  • Manage ETL flows separately from data preparation flows
  • Reduce load times for flows - many of my projects take more than 60 seconds to load the flow page
  • Easily keep track of which datasets apply to each aspect of a project

Optimally, both the flow view and dataset view could be segmented this way.

3 Comments
natejgardner
Level 4

One other use-case I forgot to mention:

It's common that there are multiple ways to achieve the same result, sometimes by building data pipelines against different data sources that contain the same data. For example, I might build one flow against views from a data mart while I might build another flow against the underlying tables, and then want to compare results. This creates a really messy flow where it's difficult to organize each approach separately. With multiple flows, I could build several versions of the same logical segment of my overall pipeline, compare them, and choose the one I like best to integrate into my larger pipeline, or switch between them when needed (e.g., if a critical application loads data from a data warehouse, but is optionally allowed to load data from the source operational database in the case that the data warehouse is down, I could configure a script to automatically switch flows when the database goes down without cluttering my overall flow).

CoreyS
Community Manager
Community Manager

@natejgardner just curious would Flow Zones also satisfy this idea or am I misreading your idea?

For reference: Improve visualization of large flows 

natejgardner
Level 4

I think if flow zones were able to support nested zones and had a list-based UI, they'd cover this need.