Allow people to create blank Python recipe templates

info-rchitect · ‎09-16-2023

Hi,

The standard Python code recipe looks like this:

What would be great is if I could create my own Python recipe template that looks the way our team typically codes. Here is an example where we add more imports and we add some custom code before writing the output dataset:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Commonly used imports for our Dataiku users
import polars as pl
from map import map
from dku import dku
from dataiku.core.sql import SQLExecutor2

# Read standardized custom variables
vars = dku.custom_variables

db = vars['snowflake_db']

schema = vars['snowflake_schema']

# Read recipe inputs
myinput_dataset = dataiku.Dataset("myinput_dataset")
myinput_dataset_df = myinput_dataset.get_dataframe()

# Set the input dataset location in Snowflake
source_table = f"{db}.{schema}.products_lookup"

### This is where the actual work is done ###

# Set the post-write statements of the output dataset so it can consumed
# by certain Snowflake roles
post_write_statements = [
  f"grant select on table {db}.{schema}.myoutput_dataset to role ROLE1;",
  f"grant select on table {db}.{schema}.myoutput_dataset to role ROLE2;"
]
amd_dku.set_post_write_statements('myoutput_dataset',
                                   statements=post_write_statements)

# Write recipe outputs
myoutput_dataset = dataiku.Dataset("myoutput_dataset")

# Use this to write the output dataset if you brought in a dataframe
myoutput_dataset.write_with_schema(myoutput_dataset)

# Use this to write the output dataset if you kept everything in Snowflake
SQLExecutor2.exec_recipe_fragment(output_dataset=myoutput_dataset,
                              query=f"select * from {}",
                              overwrite_output_schema=True)

In this template we handled importing proprietary and other python packages. We also read in our custom variables, setup the Snowflake table for the input dataset, set post-write statements for the output dataset, and setup the user to write to Snowflake using a dataframe or using a SELECT statement.

This would save us a lot of copying-and-pasting across our Dataiku users.

thx

Turribeach · ‎09-16-2023

This is already available. Go to any Python recipe and look at the Code Samples button top right. You can add your own. It's also available in Jupyter and SQL Notebooks. I don't believe these can be modified via the API but they can be copied across environments. They are stored under DATA_DIR/config/code-snippets/

info-rchitect · ‎09-17-2023

@Turribeach I was aware of the code snippets but that won't dynamically insert the input and output tables, right? Ideally we could make a Python template in mako or jinja and Dataiku would expose variables in the template context such as 'input_datasets', etc (mako example below).

Hmmm, I wonder if i create a plugin to make custom Python recipes for our users. The code snippet is still a good stopgap, thx

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Commonly used imports for our Dataiku users
import polars as pl
from map import map
from dku import dku
from dataiku.core.sql import SQLExecutor2

# Read standardized custom variables
vars = dku.custom_variables

db = vars['snowflake_db']

schema = vars['snowflake_schema']

# Read recipe inputs
% for input_dataset in input_datasets:
${input_dataset} = dataiku.Dataset("${input_dataset}")
${input_dataset}_df = ${input_dataset}.get_dataframe()

% endfor

% if from_snoflake:
# Set the input dataset location in Snowflake
  % for input_dataset in input_datasets:
${input_dataset}_source_table = f"{db}.{schema}.{${input_dataset}}"
  % endfor
% endif

### This is where the actual work is done ###

# Set the post-write statements of the output dataset so it can consumed
# by certain Snowflake roles

% for output_dataset in output_datasets:
post_write_statements = [
  % for role in roles:
  f"grant select on table {db}.{schema}.{${output_dataset}} to role ${role};",
  % endfor
]
dku.set_post_write_statements('${output_dataset}',
                                   statements=post_write_statements)

# Write recipe outputs
${output_dataset} = dataiku.Dataset("${output_dataset}")

# Use this to write the output dataset if you brought in a dataframe
${output_dataset}.write_with_schema(output_df)

# Use this to write the output dataset if you kept everything in Snowflake
SQLExecutor2.exec_recipe_fragment(output_dataset=${output_dataset},
                              query=f"select * from sf_result-table",
                              overwrite_output_schema=True)
% endfor

Allow people to create blank Python recipe templates

Labels

Collaboration

Data Exploration and Preparation

Consistent display of chart title when hover on chart tab

I want to use Dataiku in Japanese.

Programmatic Git Support (Shell, Python API or Both)

Method to re-order V12 Visual ML override rules