Allow people to create blank Python recipe templates
Hi,
The standard Python code recipe looks like this:
What would be great is if I could create my own Python recipe template that looks the way our team typically codes. Here is an example where we add more imports and we add some custom code before writing the output dataset:
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Commonly used imports for our Dataiku users import polars as pl from map import map from dku import dku from dataiku.core.sql import SQLExecutor2 # Read standardized custom variables vars = dku.custom_variables db = vars['snowflake_db'] schema = vars['snowflake_schema'] # Read recipe inputs myinput_dataset = dataiku.Dataset("myinput_dataset") myinput_dataset_df = myinput_dataset.get_dataframe() # Set the input dataset location in Snowflake source_table = f"{db}.{schema}.products_lookup" ### This is where the actual work is done ### # Set the post-write statements of the output dataset so it can consumed # by certain Snowflake roles post_write_statements = [ f"grant select on table {db}.{schema}.myoutput_dataset to role ROLE1;", f"grant select on table {db}.{schema}.myoutput_dataset to role ROLE2;" ] amd_dku.set_post_write_statements('myoutput_dataset', statements=post_write_statements) # Write recipe outputs myoutput_dataset = dataiku.Dataset("myoutput_dataset") # Use this to write the output dataset if you brought in a dataframe myoutput_dataset.write_with_schema(myoutput_dataset) # Use this to write the output dataset if you kept everything in Snowflake SQLExecutor2.exec_recipe_fragment(output_dataset=myoutput_dataset, query=f"select * from {}", overwrite_output_schema=True)
In this template we handled importing proprietary and other python packages. We also read in our custom variables, setup the Snowflake table for the input dataset, set post-write statements for the output dataset, and setup the user to write to Snowflake using a dataframe or using a SELECT statement.
This would save us a lot of copying-and-pasting across our Dataiku users.
thx
Comments
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,090 Neuron
This is already available. Go to any Python recipe and look at the Code Samples button top right. You can add your own. It's also available in Jupyter and SQL Notebooks. I don't believe these can be modified via the API but they can be copied across environments. They are stored under DATA_DIR/config/code-snippets/
-
@Turribeach
I was aware of the code snippets but that won't dynamically insert the input and output tables, right? Ideally we could make a Python template in mako or jinja and Dataiku would expose variables in the template context such as 'input_datasets', etc (mako example below).Hmmm, I wonder if i create a plugin to make custom Python recipes for our users. The code snippet is still a good stopgap, thx
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Commonly used imports for our Dataiku users import polars as pl from map import map from dku import dku from dataiku.core.sql import SQLExecutor2 # Read standardized custom variables vars = dku.custom_variables db = vars['snowflake_db'] schema = vars['snowflake_schema'] # Read recipe inputs % for input_dataset in input_datasets: ${input_dataset} = dataiku.Dataset("${input_dataset}") ${input_dataset}_df = ${input_dataset}.get_dataframe() % endfor % if from_snoflake: # Set the input dataset location in Snowflake % for input_dataset in input_datasets: ${input_dataset}_source_table = f"{db}.{schema}.{${input_dataset}}" % endfor % endif ### This is where the actual work is done ### # Set the post-write statements of the output dataset so it can consumed # by certain Snowflake roles % for output_dataset in output_datasets: post_write_statements = [ % for role in roles: f"grant select on table {db}.{schema}.{${output_dataset}} to role ${role};", % endfor ] dku.set_post_write_statements('${output_dataset}', statements=post_write_statements) # Write recipe outputs ${output_dataset} = dataiku.Dataset("${output_dataset}") # Use this to write the output dataset if you brought in a dataframe ${output_dataset}.write_with_schema(output_df) # Use this to write the output dataset if you kept everything in Snowflake SQLExecutor2.exec_recipe_fragment(output_dataset=${output_dataset}, query=f"select * from sf_result-table", overwrite_output_schema=True) % endfor