Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi Team,
I have a requirement to create a custom window recipe based on user inputs. I am aware we can use python to create recipes but is this use-case possible?
Additionally, we can use global variables in sql recipes and other places using ${variable_name}, is this reference possible in Lag/Lead field?
Any update is appreciated. Thankyou
Hi @nmadhu20 ,
You can use global variables in the code of a recipe and in configuration fields, but not directly in a window recipe.
What do you mean by "creating a custom window recipe based on user inputs"?
Do you want your window recipe to change depending on the input of a user but the user should not open the settings of the window recipe? That would be useful for non technical users who are not familiar with window recipes or if you need to repeat the same task again and again.
In that case, I would create a python recipe that would use the dataiku library to change the settings of your window recipe. As a Python script is able to retrieve global variables, it will be possible to change the parameters of the window recipe using a global variable. You can programatically change the settings of a window recipe in a Python recipe. You just need to write a code snippet such as :
# Set up
client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
# Create a window recipe
builder = project.new_recipe("window")
builder.with_input("input_dataset")
builder.with_new_output("window_dataset", "filesystem_managed")
recipe = builder.create()
# Change and save its settings (for example lag)
recipe_settings = recipe.get_settings()
recipe_settings.get_json_payload()['values'][0]["lag"] = True
recipe_settings.get_json_payload()['values'][0]["lagValues"] = 3
recipe_settings.save()
# Run the recipe
job = recipe.run()
So if you want to store your value for the lag in a global variable called "userLag", you could change the parameters of the window recipe by running this script :
user_input_lag = dataiku.get_custom_variables()["userLag"]
recipe_settings.get_json_payload()['values'][0]["lagValues"] = user_input_lag
recipe_settings.save()
# Run the recipe
job = recipe.run()
Does this help?
Hi @nmadhu20 ,
You can use global variables in the code of a recipe and in configuration fields, but not directly in a window recipe.
What do you mean by "creating a custom window recipe based on user inputs"?
Do you want your window recipe to change depending on the input of a user but the user should not open the settings of the window recipe? That would be useful for non technical users who are not familiar with window recipes or if you need to repeat the same task again and again.
In that case, I would create a python recipe that would use the dataiku library to change the settings of your window recipe. As a Python script is able to retrieve global variables, it will be possible to change the parameters of the window recipe using a global variable. You can programatically change the settings of a window recipe in a Python recipe. You just need to write a code snippet such as :
# Set up
client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
# Create a window recipe
builder = project.new_recipe("window")
builder.with_input("input_dataset")
builder.with_new_output("window_dataset", "filesystem_managed")
recipe = builder.create()
# Change and save its settings (for example lag)
recipe_settings = recipe.get_settings()
recipe_settings.get_json_payload()['values'][0]["lag"] = True
recipe_settings.get_json_payload()['values'][0]["lagValues"] = 3
recipe_settings.save()
# Run the recipe
job = recipe.run()
So if you want to store your value for the lag in a global variable called "userLag", you could change the parameters of the window recipe by running this script :
user_input_lag = dataiku.get_custom_variables()["userLag"]
recipe_settings.get_json_payload()['values'][0]["lagValues"] = user_input_lag
recipe_settings.save()
# Run the recipe
job = recipe.run()
Does this help?
Thanks @Marlan ! Yes your option is much simpler to implement actually !
Thankyou for your reply @Marlan
I actually do need to explore the sql option as well and draw a comparison. I was trying to find the option of converting window into sql but couldn't find.
Could you please show me with a screenshot from where can I do the conversion. It would be really helpful.
Thanks!
Hi @nmadhu20 , the conversion option will appear when you create your window recipe with SQL datasets and your compute engine is set to SQL. So you should sync first your datasets to a SQL database if they are not SQL datasets already. Then you will be able to convert the recipe from the output tab
For more information, have a look at this thread.
Thankyou for the reply @Marine .
Two quick questions :
1. What is the syntax required if I want the output dataset to be cloud storage s3 type in below line, instead of "filesystem_managed"?
builder.with_new_output(output_dataset, "filesystem_managed")
I tried "s3" and "amazon_s3" but didnt work. Is there any such mapping list available for dataset type-> syntax ?
2. If instead window recipe, I want to create a sql recipe, what is the type of that in below line instead of window?
builder = project.new_recipe("window")
I tried "sql" and it shows error.
Thanks again.
No worries @nmadhu20 !
1. "with_new_output" takes the connection name as an argument, so you should enter the name of your s3 connection. For more information, you may have a look at the documentation.
The name of the connection is displayed when you create a new dataset. In the screenshot below, I would choose "s3_integration_tests".
2. Is your input dataset stored on a SQL database? You can only create SQL recipe on SQL datasets.
The keyword is "sql_query"
builder = project.new_recipe("sql_query")
Thankyou for the reply @Marine
The use-case here is that the output of window recipe(s3) would be input to sql spark recipe whose query we will dynamically create in python and add and run it from python. The output will also be s3 output.
builder = project.new_recipe(??) #what would be the keyword for sql_spark?
builder.with_input(output_df)
builder.with_new_output("name_of_output_dataset", "s3_connection_name")
recipe = builder.create()
recipe_settings.get_json_payload() #for adding query in the recipe which keyword is used?
Thanks again.
Yes, this worked like a charm. Thankyou! @Marine