Custom Window recipe

Options
nmadhu20
nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron

Hi Team,

I have a requirement to create a custom window recipe based on user inputs. I am aware we can use python to create recipes but is this use-case possible?

Additionally, we can use global variables in sql recipes and other places using ${variable_name}, is this reference possible in Lag/Lead field?

Any update is appreciated. Thankyou

Best Answer

  • Marine
    Marine Dataiker, Registered Posts: 29 Dataiker
    edited July 17 Answer ✓
    Options

    Hi @nmadhu20
    ,

    You can use global variables in the code of a recipe and in configuration fields, but not directly in a window recipe.

    What do you mean by "creating a custom window recipe based on user inputs"?

    Do you want your window recipe to change depending on the input of a user but the user should not open the settings of the window recipe? That would be useful for non technical users who are not familiar with window recipes or if you need to repeat the same task again and again.

    In that case, I would create a python recipe that would use the dataiku library to change the settings of your window recipe. As a Python script is able to retrieve global variables, it will be possible to change the parameters of the window recipe using a global variable. You can programatically change the settings of a window recipe in a Python recipe. You just need to write a code snippet such as :

    # Set up
    client = dataiku.api_client()
    project = client.get_project(dataiku.default_project_key())
    
    # Create a window recipe
    builder = project.new_recipe("window")
    builder.with_input("input_dataset")
    builder.with_new_output("window_dataset", "filesystem_managed")
    recipe = builder.create()
    
    # Change and save its settings (for example lag)
    recipe_settings = recipe.get_settings()
    recipe_settings.get_json_payload()['values'][0]["lag"] = True
    recipe_settings.get_json_payload()['values'][0]["lagValues"] = 3
    recipe_settings.save()
    
    # Run the recipe
    job = recipe.run()

    So if you want to store your value for the lag in a global variable called "userLag", you could change the parameters of the window recipe by running this script :

    user_input_lag = dataiku.get_custom_variables()["userLag"]
    recipe_settings.get_json_payload()['values'][0]["lagValues"] = user_input_lag
    recipe_settings.save()
    
    # Run the recipe
    job = recipe.run()

    Does this help?

Answers

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Hi @nmadhu20
    , another option that is less flexible than the one @Marine
    shared would be to convert the window recipe to a SQL script (an option on the Output definition step) and then edit that script to add project variables wherever you'd like using the ${varname} syntax.

    Marlan

  • Marine
    Marine Dataiker, Registered Posts: 29 Dataiker
    Options

    Thanks @Marlan
    ! Yes your option is much simpler to implement actually !

  • nmadhu20
    nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron
    Options

    Yes, this worked like a charm. Thankyou! @Marine

  • nmadhu20
    nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron
    Options

    Thankyou for your reply @Marlan

    I actually do need to explore the sql option as well and draw a comparison. I was trying to find the option of converting window into sql but couldn't find.

    Could you please show me with a screenshot from where can I do the conversion. It would be really helpful.

    Thanks!

  • Marine
    Marine Dataiker, Registered Posts: 29 Dataiker
    Options

    Hi @nmadhu20
    , the conversion option will appear when you create your window recipe with SQL datasets and your compute engine is set to SQL. So you should sync first your datasets to a SQL database if they are not SQL datasets already. Then you will be able to convert the recipe from the output tab

    Screenshot 2021-11-17 at 15.19.34.png

    For more information, have a look at this thread.

  • nmadhu20
    nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron
    Options

    Thankyou for the reply @Marine
    .

    Two quick questions :
    1. What is the syntax required if I want the output dataset to be cloud storage s3 type in below line, instead of "filesystem_managed"?

    builder.with_new_output(output_dataset, "filesystem_managed")

    I tried "s3" and "amazon_s3" but didnt work. Is there any such mapping list available for dataset type-> syntax ?

    2. If instead window recipe, I want to create a sql recipe, what is the type of that in below line instead of window?

    builder = project.new_recipe("window")

    I tried "sql" and it shows error.

    Thanks again.

  • Marine
    Marine Dataiker, Registered Posts: 29 Dataiker
    edited July 17
    Options

    No worries @nmadhu20
    !

    1. "with_new_output" takes the connection name as an argument, so you should enter the name of your s3 connection. For more information, you may have a look at the documentation.

    The name of the connection is displayed when you create a new dataset. In the screenshot below, I would choose "s3_integration_tests".

    Screenshot 2021-11-18 at 18.28.45.png

    2. Is your input dataset stored on a SQL database? You can only create SQL recipe on SQL datasets.

    The keyword is "sql_query"

    builder = project.new_recipe("sql_query") 
  • nmadhu20
    nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron
    Options

    Thankyou for the reply @Marine

    The use-case here is that the output of window recipe(s3) would be input to sql spark recipe whose query we will dynamically create in python and add and run it from python. The output will also be s3 output.

    builder = project.new_recipe(??) #what would be the keyword for sql_spark?
    builder.with_input(output_df)
    builder.with_new_output("name_of_output_dataset", "s3_connection_name")
    recipe = builder.create()

    recipe_settings.get_json_payload() #for adding query in the recipe which keyword is used?

    Thanks again.

Setup Info
    Tags
      Help me…