The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Application-as-recipes and SQL scripts reusable for different connections

Solved!
pvannies
Level 2
Level 2
Application-as-recipes and SQL scripts reusable for different connections

While turning a flow of SQL script recipes into an application-as-recipe we encountered some difficulties making it generalisable for new input SQL tables that have a different connection/schema/tablename as the original input table. We are interested to learn how other users dealt with this and what best practices are according to the dataiku developers.

Specifically, here are some of our questions:

- In the application designer advanced options, there is the possibilities to add Connections for remappings. We understood this happens when the application is instantiated -> is there a (future) possibility that this remapping automatically points to the connection of the input dataset for the application-as-recipe? As we do not know the input dataset/connection we will use the app-as-recipe for when designing it, we had to find a work-around to do the remapping.

- To use the tablename and schema of the inputtable as a variable that can be used in the SQL scripts, we wrote a custom python script that finds the input to the first recipe in the instantiated project and then looks up its properties using the dataset class and writes them as project variables. To have the 'Execute SQL' steps in our scenario run with the correct connection, we looked up every step of the scenario and if it had a connection changed the connection accordingly. Is there any easier way to do this?

- Will there be documentation available on these topics? we could only find the documentation for file based recipes and scenarios.

Thanks for your input!
This is a follow up from @akshaykatre post: How do you use SQL Script in recipe as application - Dataiku Community @fchataigner2

1 Solution
pvannies
Level 2
Level 2
Author

Will post our (@akshaykatre) workaround here, hope this may be of help to someone in the community! 

Background: Upon running the application-as-recipe, a temporary project instance is created that performs input replacements and the execution of the flow defined in a scenario.
After the scenario has run, the output will be copied back from the instantiated project to the project where the app-as-recipe is called.
What this means for the use of SQL datasets, is that the connection of any dataset in the instantiated project remains the same connection as it was when defining your flow in the project that was turned into the application-as-recipe.
Because it is likely you will use the app-as-recipe on a dataset on a different SQL server, you will have to remap the connections dynamically within the scenario when designing your application. 
If you forget to do this, you will get an error that the SQL recipe can't work on multiple connections simultaneously.

Write two scenario's:

  • The first scenario is a python script that sets all relevant variables needed for the SQL scripts and applies remapping of your connections of the second scenario.
  • The second scenario deals with building the actual output dataset. The connections of this scenario are remapped before it is ran by the end of the first scenario.
Depending on your actual use case, you can reuse (parts of) the python code below for the first scenario 'SettingVariables'. 

 

# Python script for SettingVariables Scenario
 
import dataiku
from dataiku.scenario import Scenario
 
client = dataiku.api_client()
 
# the project we are in now - the temporary application-as-recipe project
project = client.get_default_project()
 
# retrieve necessary information from the input dataset via the name of the first recipe
first_recipe = project.get_recipe('name_recipe_linked_to_input_dataset')
name_input_dataset = first_recipe.get_settings().get_recipe_inputs().get('main').get('items')[0].get('ref')
 
# get parameters via this dataset which has the format ProjectKey.dataset_name
# note that ProjectKey can also be obtained via the scenario as 'scenarioTriggerParam_projectKey'
 
def datasetinfo(dataset):
    dataset =  dataiku.Dataset(dataset)
    connection = dataset.get_config().get('params').get('connection')
    schema = dataset.get_config().get('params').get('schema')
    table_name = dataset.get_config().get('params').get('table')
    projectkey = dataset.get_config()['projectKey']
    return connection, schema, table_name, projectkey
 
connection, schema, tablename, projectkey = datasetinfo(name_input_dataset)
 
# set global (or local) variables in temporary application-as-recipe project if variables are used outside this scenario
vars = project.get_variables()
vars['standard']['connection'] = connection
vars['standard']['schema'] = schema
 
 
# Change scenario connection
scenario = project.get_scenario('BuildDatasets')
settings = scenario.get_settings()
for step in settings.raw_steps:
    if 'connection' in step['params'].keys():
    step['params']['connection'] = connection
    settings.save()
 
 
# Change connection and schema of each dataset
for datasets in project.list_datasets():
    data_table = project.get_dataset(datasets['name'])
    data_settings = data_table.get_settings()
    data_settings.get_raw_params()['connection'] = connection
    data_settings.get_raw_params()['schema'] = schema
    data_settings.save()
 
 
# Adjust for names that have variables in them
if "$" in tablename:
    tablename = tablename.replace("$", "")
    localvars = dataiku.Project(project_key=projectkey).get_variables().get('local')
    standardvars = dataiku.Project(project_key=projectkey).get_variables().get('standard')
    standardvars.update(localvars)
    tablename = tablename.format(**standardvars)
 
# example of a local variable that  can be used by SQL scripts
vars["local"]["input_table"] = schema + "." + tablename
 
# save the new variables
project.set_variables(vars)
 
# Call scenario that builds the actual datasets which now has remapped connections and can make use of variables transferred from the callingProject
Scenario().run_scenario('BuildDatasets')

 

 

 

View solution in original post

2 Replies
pvannies
Level 2
Level 2
Author

Will post our (@akshaykatre) workaround here, hope this may be of help to someone in the community! 

Background: Upon running the application-as-recipe, a temporary project instance is created that performs input replacements and the execution of the flow defined in a scenario.
After the scenario has run, the output will be copied back from the instantiated project to the project where the app-as-recipe is called.
What this means for the use of SQL datasets, is that the connection of any dataset in the instantiated project remains the same connection as it was when defining your flow in the project that was turned into the application-as-recipe.
Because it is likely you will use the app-as-recipe on a dataset on a different SQL server, you will have to remap the connections dynamically within the scenario when designing your application. 
If you forget to do this, you will get an error that the SQL recipe can't work on multiple connections simultaneously.

Write two scenario's:

  • The first scenario is a python script that sets all relevant variables needed for the SQL scripts and applies remapping of your connections of the second scenario.
  • The second scenario deals with building the actual output dataset. The connections of this scenario are remapped before it is ran by the end of the first scenario.
Depending on your actual use case, you can reuse (parts of) the python code below for the first scenario 'SettingVariables'. 

 

# Python script for SettingVariables Scenario
 
import dataiku
from dataiku.scenario import Scenario
 
client = dataiku.api_client()
 
# the project we are in now - the temporary application-as-recipe project
project = client.get_default_project()
 
# retrieve necessary information from the input dataset via the name of the first recipe
first_recipe = project.get_recipe('name_recipe_linked_to_input_dataset')
name_input_dataset = first_recipe.get_settings().get_recipe_inputs().get('main').get('items')[0].get('ref')
 
# get parameters via this dataset which has the format ProjectKey.dataset_name
# note that ProjectKey can also be obtained via the scenario as 'scenarioTriggerParam_projectKey'
 
def datasetinfo(dataset):
    dataset =  dataiku.Dataset(dataset)
    connection = dataset.get_config().get('params').get('connection')
    schema = dataset.get_config().get('params').get('schema')
    table_name = dataset.get_config().get('params').get('table')
    projectkey = dataset.get_config()['projectKey']
    return connection, schema, table_name, projectkey
 
connection, schema, tablename, projectkey = datasetinfo(name_input_dataset)
 
# set global (or local) variables in temporary application-as-recipe project if variables are used outside this scenario
vars = project.get_variables()
vars['standard']['connection'] = connection
vars['standard']['schema'] = schema
 
 
# Change scenario connection
scenario = project.get_scenario('BuildDatasets')
settings = scenario.get_settings()
for step in settings.raw_steps:
    if 'connection' in step['params'].keys():
    step['params']['connection'] = connection
    settings.save()
 
 
# Change connection and schema of each dataset
for datasets in project.list_datasets():
    data_table = project.get_dataset(datasets['name'])
    data_settings = data_table.get_settings()
    data_settings.get_raw_params()['connection'] = connection
    data_settings.get_raw_params()['schema'] = schema
    data_settings.save()
 
 
# Adjust for names that have variables in them
if "$" in tablename:
    tablename = tablename.replace("$", "")
    localvars = dataiku.Project(project_key=projectkey).get_variables().get('local')
    standardvars = dataiku.Project(project_key=projectkey).get_variables().get('standard')
    standardvars.update(localvars)
    tablename = tablename.format(**standardvars)
 
# example of a local variable that  can be used by SQL scripts
vars["local"]["input_table"] = schema + "." + tablename
 
# save the new variables
project.set_variables(vars)
 
# Call scenario that builds the actual datasets which now has remapped connections and can make use of variables transferred from the callingProject
Scenario().run_scenario('BuildDatasets')

 

 

 

View solution in original post

CoreyS
Community Manager
Community Manager

Thank you for sharing your workaround @pvannies @akshaykatre !

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
A banner prompting to get Dataiku DSS