Allow people to create blank Python recipe templates

Options
info-rchitect
info-rchitect Registered Posts: 169 ✭✭✭✭✭✭
edited July 16 in Product Ideas

Hi,

The standard Python code recipe looks like this:

dataiku_python_recipe_template.png

What would be great is if I could create my own Python recipe template that looks the way our team typically codes. Here is an example where we add more imports and we add some custom code before writing the output dataset:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Commonly used imports for our Dataiku users
import polars as pl
from map import map
from dku import dku
from dataiku.core.sql import SQLExecutor2

# Read standardized custom variables
vars = dku.custom_variables

db = vars['snowflake_db']

schema = vars['snowflake_schema']

# Read recipe inputs
myinput_dataset = dataiku.Dataset("myinput_dataset")
myinput_dataset_df = myinput_dataset.get_dataframe()

# Set the input dataset location in Snowflake
source_table = f"{db}.{schema}.products_lookup"

### This is where the actual work is done ###

# Set the post-write statements of the output dataset so it can consumed
# by certain Snowflake roles
post_write_statements = [
  f"grant select on table {db}.{schema}.myoutput_dataset to role ROLE1;",
  f"grant select on table {db}.{schema}.myoutput_dataset to role ROLE2;"
]
amd_dku.set_post_write_statements('myoutput_dataset',
                                   statements=post_write_statements)

# Write recipe outputs
myoutput_dataset = dataiku.Dataset("myoutput_dataset")

# Use this to write the output dataset if you brought in a dataframe
myoutput_dataset.write_with_schema(myoutput_dataset)

# Use this to write the output dataset if you kept everything in Snowflake
SQLExecutor2.exec_recipe_fragment(output_dataset=myoutput_dataset,
                              query=f"select * from {}",
                              overwrite_output_schema=True)

In this template we handled importing proprietary and other python packages. We also read in our custom variables, setup the Snowflake table for the input dataset, set post-write statements for the output dataset, and setup the user to write to Snowflake using a dataframe or using a SELECT statement.

This would save us a lot of copying-and-pasting across our Dataiku users.

thx

0
0 votes

New · Last Updated

Comments

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,727 Neuron
    Options

    This is already available. Go to any Python recipe and look at the Code Samples button top right. You can add your own. It's also available in Jupyter and SQL Notebooks. I don't believe these can be modified via the API but they can be copied across environments. They are stored under DATA_DIR/config/code-snippets/

  • info-rchitect
    info-rchitect Registered Posts: 169 ✭✭✭✭✭✭
    edited July 17
    Options

    @Turribeach
    I was aware of the code snippets but that won't dynamically insert the input and output tables, right? Ideally we could make a Python template in mako or jinja and Dataiku would expose variables in the template context such as 'input_datasets', etc (mako example below).

    Hmmm, I wonder if i create a plugin to make custom Python recipes for our users. The code snippet is still a good stopgap, thx

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    # Commonly used imports for our Dataiku users
    import polars as pl
    from map import map
    from dku import dku
    from dataiku.core.sql import SQLExecutor2
    
    # Read standardized custom variables
    vars = dku.custom_variables
    
    db = vars['snowflake_db']
    
    schema = vars['snowflake_schema']
    
    # Read recipe inputs
    % for input_dataset in input_datasets:
    ${input_dataset} = dataiku.Dataset("${input_dataset}")
    ${input_dataset}_df = ${input_dataset}.get_dataframe()
    
    % endfor
    
    % if from_snoflake:
    # Set the input dataset location in Snowflake
      % for input_dataset in input_datasets:
    ${input_dataset}_source_table = f"{db}.{schema}.{${input_dataset}}"
      % endfor
    % endif
    
    ### This is where the actual work is done ###
    
    # Set the post-write statements of the output dataset so it can consumed
    # by certain Snowflake roles
    
    % for output_dataset in output_datasets:
    post_write_statements = [
      % for role in roles:
      f"grant select on table {db}.{schema}.{${output_dataset}} to role ${role};",
      % endfor
    ]
    dku.set_post_write_statements('${output_dataset}',
                                       statements=post_write_statements)
    
    # Write recipe outputs
    ${output_dataset} = dataiku.Dataset("${output_dataset}")
    
    # Use this to write the output dataset if you brought in a dataframe
    ${output_dataset}.write_with_schema(output_df)
    
    # Use this to write the output dataset if you kept everything in Snowflake
    SQLExecutor2.exec_recipe_fragment(output_dataset=${output_dataset},
                                  query=f"select * from sf_result-table",
                                  overwrite_output_schema=True)
    % endfor
    
    
    

Setup Info
    Tags
      Help me…