Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am working on a scenario where I need to apply the same series of steps (Preparation, Join, Filter etc) to a large number of datasets. These datasets are sourced from different databases requiring separate credentials. What is the simplest way of addressing this use case? The way I can come up with is:
1. Populate the input datasets into the flow via SQL or Python recipe. Connecting to the appropriate dataset/connection will be taken care of within the recipe based on an environment variable input to the recipe.
2. Set up a scenario to cycle through all values required for multiple datasets and pass them as environment variables to the code recipe
Is there a less/no-code way to do this?
Hi @yashpuranik ,
You may want to look at using Application as a recipe for this use case:
You would need to define the input datasets but reuse the rest of the steps in your flow on different datasets.
Let me know if that helps
I was aware of Application as a recipe, and was certainly planning to use it to streamline my flow. I would like to streamline the definition of input datasets as well. Something like the following:
1. Set a dataset (table) with the list of input connections I want to apply my recipe too
2. Have an "iterator" recipe that will work on one value/connection at a time. It will load the input dataset, pass it on to the application as a recipe and generate the output.
That way, I don't need multiple sub flows, a single flow can manage the entire operation. Any way to do it outside a code recipe?
Outside of the code recipe.
The only non code way I can think of is to copy the subflow to the same project and manually change the input dataset from the flow to the other datasets. This would require no code.
Then you would multiple sub-flows in the same project, if you need to change output datasets you can also use change connection from the Other actions.
Gotcha. It could be useful for Dataiku to implement a visual recipe that abstracts a for loop for situations like these. Admittedly it is a very short for loop in Python, but a visual recipe will help expand the reach for non-programmer citizen data scientists