How to calculate number of datasets used in merge recipe

Options
shak99
shak99 Registered Posts: 3 ✭✭✭

Hi all,

Is it possible to determine how many datasets are being ingested by a stack recipe and then use that value later on in a prepare recipe?

Thanks!


Operating system used: Windows

Answers

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Hi @shak99
    , welcome!

    In a visual Stack-recipe an option is available to include an origin column, indicating the original dataset. See attached screenshot. -Hope this helps! merge.jpg

  • shak99
    shak99 Registered Posts: 3 ✭✭✭
    Options

    Thanks @Jurre
    , I understand how to accomplish this visually however I would like to do this programmatically. Therefore, for instance, can I utilise a python recipe to determine how many datasets are being used as inputs for a 'stack' recipe?

    edit: In the post I mentioned 'merge recipe' I in reality meant a 'stack recipe'

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Hi @shak99
    ,

    I'm not that familiar with python to fully answer your question but possibly something with adding multiIndex keys would do the trick. However i strongly suggest that you wait a little for pythonista's to join the conversation here. Cheers, Jurre

  • JuanE
    JuanE Dataiker, Registered Posts: 45 Dataiker
    edited July 17
    Options

    Hello @shak99

    You can do this by leveraging the DSS Python API. Here’s a sample code snippet to get you started:

    import dataiku
    
    client = dataiku.api_client()
    project = client.get_project("PROJECTKEY") # Select your Project accordingly
    stack_recipe = project.get_recipe("the_stack_recipe_name") # Select your stack recipe accordingly
    stack_recipe_settings = stack_recipe.get_settings()
    number_of_inputs = len(stack_recipe_settings.get_recipe_inputs()['main']['items'])

    I hope that helps.

Setup Info
    Tags
      Help me…