Multiple flows per project

natejgardner · ‎04-10-2021

It would be really helpful to be able to split large project into multiple flows. Currently, combining filters with custom tags sort of enables this, but users need to remember to add the appropriate tag to every dataset and recipe as it's created. It'd be nice if from a project page, I could create multiple flows that separate different aspects of a project. Datasets from neighboring flows could still be accessed in the same way datasets can currently be shared across projects. This would enable a few nice use-cases:

Logically segment long flows into multiple stages for abstraction
- Hide preprocessing steps like joining a star schema into a single table to keep the flow from getting too tall
Manage multiple independent but related flows separately
Easily copy an entire flow into multiple versions without needing to manage git branches
Manage ETL flows separately from data preparation flows
Reduce load times for flows - many of my projects take more than 60 seconds to load the flow page
Easily keep track of which datasets apply to each aspect of a project

Optimally, both the flow view and dataset view could be segmented this way.

natejgardner · ‎04-10-2021

One other use-case I forgot to mention:

It's common that there are multiple ways to achieve the same result, sometimes by building data pipelines against different data sources that contain the same data. For example, I might build one flow against views from a data mart while I might build another flow against the underlying tables, and then want to compare results. This creates a really messy flow where it's difficult to organize each approach separately. With multiple flows, I could build several versions of the same logical segment of my overall pipeline, compare them, and choose the one I like best to integrate into my larger pipeline, or switch between them when needed (e.g., if a critical application loads data from a data warehouse, but is optionally allowed to load data from the source operational database in the case that the data warehouse is down, I could configure a script to automatically switch flows when the database goes down without cluttering my overall flow).

CoreyS · ‎04-26-2021

@natejgardner just curious would Flow Zones also satisfy this idea or am I misreading your idea?

For reference: Improve visualization of large flows

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

natejgardner · ‎05-05-2021

I think if flow zones were able to support nested zones and had a list-based UI, they'd cover this need.

CoreyS · ‎05-17-2021

Upon further investigation we feel that this idea has been sufficiently covered with the launch of Flow Zones in version 8 of DSS.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

CoreyS · ‎05-17-2021

Duplicate status was set in error.

We did it! Your idea became a reality with the release of Flow Zones in DSS 8. Thanks again for your idea and please let us know if you have any questions.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

Multiple flows per project

Labels

Data Exploration and Preparation

Consistent display of chart title when hover on chart tab

I want to use Dataiku in Japanese.

Programmatic Git Support (Shell, Python API or Both)

Method to re-order V12 Visual ML override rules