Custom Window recipe

Solved!
nmadhu20
Custom Window recipe

Hi Team,

I have a requirement to create a custom window recipe based on user inputs. I am aware we can use python to create recipes but is this use-case possible?

Additionally, we can use global variables in sql recipes and other places using ${variable_name}, is this reference possible in Lag/Lead field?

 

Any update is appreciated. Thankyou

0 Kudos
1 Solution
Marine
Dataiker

Hi @nmadhu20 ,

You can use global variables in the code of a recipe and in configuration fields, but not directly in a window recipe. 

What do you mean by "creating a custom window recipe based on user inputs"?

Do you want your window recipe to change depending on the input of a user but the user should not open the settings of the window recipe? That would be useful for non technical users who are not familiar with window recipes or if you need to repeat the same task again and again. 

In that case, I would create a python recipe that would use the dataiku library to change the settings of your window recipe.  As a Python script is able to retrieve global variables, it will be possible to change the parameters of the window recipe using a global variable. You can programatically change the settings of a window recipe in a Python recipe. You just need to write a code snippet such as :

 

# Set up
client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())

# Create a window recipe
builder = project.new_recipe("window")
builder.with_input("input_dataset")
builder.with_new_output("window_dataset", "filesystem_managed")
recipe = builder.create()

# Change and save its settings (for example lag)
recipe_settings = recipe.get_settings()
recipe_settings.get_json_payload()['values'][0]["lag"] = True
recipe_settings.get_json_payload()['values'][0]["lagValues"] = 3
recipe_settings.save()

# Run the recipe
job = recipe.run()

 

So if you want to store your value for the lag in a global variable called "userLag", you could change the parameters of the window recipe by running this script : 

user_input_lag = dataiku.get_custom_variables()["userLag"]
recipe_settings.get_json_payload()['values'][0]["lagValues"] = user_input_lag
recipe_settings.save()

# Run the recipe
job = recipe.run()

 Does this help? 

View solution in original post

9 Replies
Marine
Dataiker

Hi @nmadhu20 ,

You can use global variables in the code of a recipe and in configuration fields, but not directly in a window recipe. 

What do you mean by "creating a custom window recipe based on user inputs"?

Do you want your window recipe to change depending on the input of a user but the user should not open the settings of the window recipe? That would be useful for non technical users who are not familiar with window recipes or if you need to repeat the same task again and again. 

In that case, I would create a python recipe that would use the dataiku library to change the settings of your window recipe.  As a Python script is able to retrieve global variables, it will be possible to change the parameters of the window recipe using a global variable. You can programatically change the settings of a window recipe in a Python recipe. You just need to write a code snippet such as :

 

# Set up
client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())

# Create a window recipe
builder = project.new_recipe("window")
builder.with_input("input_dataset")
builder.with_new_output("window_dataset", "filesystem_managed")
recipe = builder.create()

# Change and save its settings (for example lag)
recipe_settings = recipe.get_settings()
recipe_settings.get_json_payload()['values'][0]["lag"] = True
recipe_settings.get_json_payload()['values'][0]["lagValues"] = 3
recipe_settings.save()

# Run the recipe
job = recipe.run()

 

So if you want to store your value for the lag in a global variable called "userLag", you could change the parameters of the window recipe by running this script : 

user_input_lag = dataiku.get_custom_variables()["userLag"]
recipe_settings.get_json_payload()['values'][0]["lagValues"] = user_input_lag
recipe_settings.save()

# Run the recipe
job = recipe.run()

 Does this help? 

Marlan

Hi @nmadhu20, another option that is less flexible than the one @Marine shared would be to convert the window recipe to a SQL script (an option on the Output definition step) and then edit that script to add project variables wherever you'd like using the ${varname} syntax.

Marlan

Marine
Dataiker

Thanks @Marlan ! Yes your option is much simpler to implement actually !

0 Kudos
nmadhu20
Author

Thankyou for your reply @Marlan 

I actually do need to explore the sql option as well and draw a comparison. I was trying to find the option of converting window into sql but couldn't find.

Could you please show me with a screenshot from where can I do the conversion. It would be really helpful.

Thanks!

0 Kudos
Marine
Dataiker

Hi @nmadhu20 , the conversion option will appear when you create your window recipe with SQL datasets and your compute engine is set to SQL. So you should sync first your datasets to a SQL database if they are not SQL datasets already. Then you will be able to convert the recipe from the output tab 

Screenshot 2021-11-17 at 15.19.34.png

For more information, have a look at this thread.

0 Kudos
nmadhu20
Author

Thankyou for the reply @Marine .

Two quick questions :
1. What is the syntax required if I want the output dataset to be cloud storage s3 type in below line, instead of "filesystem_managed"?

builder.with_new_output(output_dataset, "filesystem_managed")

I tried "s3" and "amazon_s3" but didnt work. Is there any such mapping list available for dataset type-> syntax ?

2. If instead window recipe, I want to create a sql recipe, what is the type of that in below line instead of window?

builder = project.new_recipe("window")

I tried "sql" and it shows error.

 

Thanks again.

0 Kudos
Marine
Dataiker

No worries @nmadhu20 ! 

1. "with_new_output" takes the connection name as an argument, so you should enter the name of your s3 connection. For more information, you may have a look at the documentation.

The name of the connection is displayed when you create a new dataset. In the screenshot below, I would choose "s3_integration_tests".

Screenshot 2021-11-18 at 18.28.45.png

2. Is your input dataset stored on a SQL database? You can only create SQL recipe on SQL datasets. 

The keyword is "sql_query"

builder = project.new_recipe("sql_query") 
nmadhu20
Author

Thankyou for the reply @Marine 

The use-case here is that the output of window recipe(s3) would be input to sql spark recipe whose query we will dynamically create in python and add and run it from python. The output will also be s3 output.

builder = project.new_recipe(??) #what would be the keyword for sql_spark?
builder.with_input(output_df)
builder.with_new_output("name_of_output_dataset", "s3_connection_name")
recipe = builder.create()

recipe_settings.get_json_payload() #for adding  query in the recipe which keyword is used?

 

Thanks again.

0 Kudos
nmadhu20
Author

Yes, this worked like a charm. Thankyou! @Marine