How to calculate number of datasets used in merge recipe
Hi all,
Is it possible to determine how many datasets are being ingested by a stack recipe and then use that value later on in a prepare recipe?
Thanks!
Operating system used: Windows
Answers
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi @shak99
, welcome!In a visual Stack-recipe an option is available to include an origin column, indicating the original dataset. See attached screenshot. -Hope this helps!
-
Thanks @Jurre
, I understand how to accomplish this visually however I would like to do this programmatically. Therefore, for instance, can I utilise a python recipe to determine how many datasets are being used as inputs for a 'stack' recipe?edit: In the post I mentioned 'merge recipe' I in reality meant a 'stack recipe'
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi @shak99
,I'm not that familiar with python to fully answer your question but possibly something with adding multiIndex keys would do the trick. However i strongly suggest that you wait a little for pythonista's to join the conversation here. Cheers, Jurre
-
Hello @shak99
You can do this by leveraging the DSS Python API. Here’s a sample code snippet to get you started:
import dataiku client = dataiku.api_client() project = client.get_project("PROJECTKEY") # Select your Project accordingly stack_recipe = project.get_recipe("the_stack_recipe_name") # Select your stack recipe accordingly stack_recipe_settings = stack_recipe.get_settings() number_of_inputs = len(stack_recipe_settings.get_recipe_inputs()['main']['items'])
I hope that helps.