Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

How to calculate number of datasets used in merge recipe

shak99
Level 1
How to calculate number of datasets used in merge recipe

Hi all,

Is it possible to determine how many datasets are being ingested by a stack recipe and then use that value later on in a prepare recipe? 

Thanks!


Operating system used: Windows

0 Kudos
4 Replies
Jurre

Hi @shak99 , welcome!

In a visual Stack-recipe an option is available to include an origin column, indicating the original dataset.  See attached screenshot. -Hope this helps! merge.jpg

0 Kudos
shak99
Level 1
Author

Thanks @Jurre , I understand how to accomplish this visually however I would like to do this programmatically. Therefore, for instance, can I utilise a python recipe to determine how many datasets are being used as inputs for a 'stack' recipe?

 

edit: In the post I mentioned 'merge recipe' I in reality meant a 'stack recipe'

0 Kudos
JuanE
Dataiker

Hello @shak99

You can do this by leveraging the DSS Python API. Here’s a sample code snippet to get you started:

import dataiku

client = dataiku.api_client()
project = client.get_project("PROJECTKEY") # Select your Project accordingly
stack_recipe = project.get_recipe("the_stack_recipe_name") # Select your stack recipe accordingly
stack_recipe_settings = stack_recipe.get_settings()
number_of_inputs = len(stack_recipe_settings.get_recipe_inputs()['main']['items'])

 

I hope that helps.

Jurre

Hi @shak99 ,

I'm not that familiar with python to fully answer your question but possibly something with adding multiIndex keys would do the trick. However i strongly suggest that you wait a little for pythonista's to join the conversation here. Cheers, Jurre

0 Kudos