Custom Window recipe
Hi Team,
I have a requirement to create a custom window recipe based on user inputs. I am aware we can use python to create recipes but is this use-case possible?
Additionally, we can use global variables in sql recipes and other places using ${variable_name}, is this reference possible in Lag/Lead field?
Any update is appreciated. Thankyou
Best Answer
-
Hi @nmadhu20
,You can use global variables in the code of a recipe and in configuration fields, but not directly in a window recipe.
What do you mean by "creating a custom window recipe based on user inputs"?
Do you want your window recipe to change depending on the input of a user but the user should not open the settings of the window recipe? That would be useful for non technical users who are not familiar with window recipes or if you need to repeat the same task again and again.
In that case, I would create a python recipe that would use the dataiku library to change the settings of your window recipe. As a Python script is able to retrieve global variables, it will be possible to change the parameters of the window recipe using a global variable. You can programatically change the settings of a window recipe in a Python recipe. You just need to write a code snippet such as :
# Set up client = dataiku.api_client() project = client.get_project(dataiku.default_project_key()) # Create a window recipe builder = project.new_recipe("window") builder.with_input("input_dataset") builder.with_new_output("window_dataset", "filesystem_managed") recipe = builder.create() # Change and save its settings (for example lag) recipe_settings = recipe.get_settings() recipe_settings.get_json_payload()['values'][0]["lag"] = True recipe_settings.get_json_payload()['values'][0]["lagValues"] = 3 recipe_settings.save() # Run the recipe job = recipe.run()
So if you want to store your value for the lag in a global variable called "userLag", you could change the parameters of the window recipe by running this script :
user_input_lag = dataiku.get_custom_variables()["userLag"] recipe_settings.get_json_payload()['values'][0]["lagValues"] = user_input_lag recipe_settings.save() # Run the recipe job = recipe.run()
Does this help?
Answers
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
-
Thanks @Marlan
! Yes your option is much simpler to implement actually ! -
Yes, this worked like a charm. Thankyou! @Marine
-
Thankyou for your reply @Marlan
I actually do need to explore the sql option as well and draw a comparison. I was trying to find the option of converting window into sql but couldn't find.
Could you please show me with a screenshot from where can I do the conversion. It would be really helpful.
Thanks!
-
Hi @nmadhu20
, the conversion option will appear when you create your window recipe with SQL datasets and your compute engine is set to SQL. So you should sync first your datasets to a SQL database if they are not SQL datasets already. Then you will be able to convert the recipe from the output tabFor more information, have a look at this thread.
-
Thankyou for the reply @Marine
.Two quick questions :
1. What is the syntax required if I want the output dataset to be cloud storage s3 type in below line, instead of "filesystem_managed"?builder.with_new_output(output_dataset, "filesystem_managed")
I tried "s3" and "amazon_s3" but didnt work. Is there any such mapping list available for dataset type-> syntax ?
2. If instead window recipe, I want to create a sql recipe, what is the type of that in below line instead of window?
builder = project.new_recipe("window")
I tried "sql" and it shows error.
Thanks again.
-
No worries @nmadhu20
!1. "with_new_output" takes the connection name as an argument, so you should enter the name of your s3 connection. For more information, you may have a look at the documentation.
The name of the connection is displayed when you create a new dataset. In the screenshot below, I would choose "s3_integration_tests".
2. Is your input dataset stored on a SQL database? You can only create SQL recipe on SQL datasets.
The keyword is "sql_query"
builder = project.new_recipe("sql_query")
-
Thankyou for the reply @Marine
The use-case here is that the output of window recipe(s3) would be input to sql spark recipe whose query we will dynamically create in python and add and run it from python. The output will also be s3 output.
builder = project.new_recipe(??) #what would be the keyword for sql_spark?
builder.with_input(output_df)
builder.with_new_output("name_of_output_dataset", "s3_connection_name")
recipe = builder.create()recipe_settings.get_json_payload() #for adding query in the recipe which keyword is used?
Thanks again.