Implementing ways to shorten or simplify the flow visually by merging recipes

SanderVW · September 2023

Hi, the most common issue I am running into when showing others my work or looking at the work of others is that the complexity of a flow can easily become quite daunting.

I think I have two suggestions that might help combat this issue:

1. The most common recipe used (at least in my experience) is the prepare recipe. Most other recipes I will generally use for one-off cases where I just need them to do one thing the prepare cannot handle. By adding some of these functionalities of the other recipes, making the prepare recipe even stronger you would need less steps and the flow would become cleaner.

2. The flow already has zones, but you have no way of "shortening" or "minimizing" certain subflows into one combined step for readability's sake. I think this would help with creating a simpler overview.

This is all just my two cents, of course. I would be very interested in what ideas other people might have for this issue or if you are running into this issue yourself at all.

Turribeach · September 2023

So I think you are going against the flow here, pun intended! One of the main advantages of Dataiku is it's visual flow design and it's ability to allow users to add visual recipes without knowing how to code. The step by step approach that the flow takes is a feature not a bug. Dataiku is prioritising breaking down the data pipeline in multiple steps rather than using larger complex steps. In my view that reduces complexity rather than adding complexity. Of course the flow will become bigger but this is a trade off the user designing the flow has to make: either be a "clicker" user and use mostly visual recipes which will make the flow bigger or be a "coder" user and perform a lot more work in Python recipes. Using Flow Zones can greatly assist in making a large flow more manageable and easier to understand so if you are not using Flow Zones you should start using them now.

Personally I can understand a flow much easier when there are visual recipes than Python recipes. The data is persisted at each step so I can see how the schema is changing and I can do things like row counts to see how the volume of data is changing. So my advice will be that if you want to reduce the size if your flow you should look at using Python recipes.

Thanks

Implementing ways to shorten or simplify the flow visually by merging recipes

New · Last Updated September 2023

Comments

Categories

Setup Info

Tags