Recipe get_json_payload does not describe the input datasets

tomas
tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭
edited July 18 in Using Dataiku

Hi, 

I would like to read the recipe (join recipe) inputs and columns, using the .get_json_payload() method. I have an issue with identifying the input datasets. For example the definition is:


{'table2': 1,
'table1': 0,
'conditionsMode': 'AND',
'rightLimit': {'decisionColumn': {}},
'type': 'LEFT',
'outerJoinOnTheLeft': True,
'on': [{'column1': {'name': 'subscriber_id', 'table': 0},
'column2': {'name': 'subscriber_id', 'table': 1},
'maxDistance': 1,
'type': 'EQ'},
{'column1': {'name': 'churn_date', 'table': 0},
'column2': {'name': 'effective_date', 'table': 1},
'type': 'GTE'}]}],

So I see the table 0 is the main table for the lef join (with churn date) and table 1 is the joined table. But in the recipe inputs the order is different, the first item is the joined table and the second item is the main table:


.get_definition_and_payload().get_recipe_inputs()['main']['items']

So how can I get the name of the datasets/tables when I have the index (0,1,2 etc)?

Thanks

Tagged:

Best Answer

  • spicquenot
    spicquenot Dataiker Posts: 4 Dataiker
    edited July 18 Answer ✓

    Hello,

    In the join definition you pasted, the first index refer to the virtual inputs (a virtual way of representing the input tables before the joins).

    The following code would help you get the proper table names:


    recipe = <the_recipe_object>

    payload_def = recipe.get_definition_and_payload()

    # get the definitions
    payload = payload_def.get_json_payload()
    join_def = payload["joins"][0] # get the first join definition
    virtual_inputs = payload["virtualInputs"]

    # get the virtual input indexes
    main_vi_idx = join_def["table1"]
    sec_vi_idx = join_def["table2"]

    # get the table input indexes
    main_table_idx = virtual_inputs[main_vi_idx]["index"]
    sec_table_idx = virtual_inputs[sec_vi_idx]["index"]

    # get all the table inputs
    inputs = payload_def.get_recipe_inputs()

    # retrieve the table input names
    main_table_name = inputs["main"]["items"][main_table_idx]["ref"]
    sec_table_name = inputs["main"]["items"][sec_table_idx]["ref"]

    Regards,

    Sam

Answers

Setup Info
    Tags
      Help me…