Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 25, 2021 12:38PM
Likes: 1
Replies: 2
While turning a flow of SQL script recipes into an application-as-recipe we encountered some difficulties making it generalisable for new input SQL tables that have a different connection/schema/tablename as the original input table. We are interested to learn how other users dealt with this and what best practices are according to the dataiku developers.
Specifically, here are some of our questions:
- In the application designer advanced options, there is the possibilities to add Connections for remappings. We understood this happens when the application is instantiated -> is there a (future) possibility that this remapping automatically points to the connection of the input dataset for the application-as-recipe? As we do not know the input dataset/connection we will use the app-as-recipe for when designing it, we had to find a work-around to do the remapping.
- To use the tablename and schema of the inputtable as a variable that can be used in the SQL scripts, we wrote a custom python script that finds the input to the first recipe in the instantiated project and then looks up its properties using the dataset class and writes them as project variables. To have the 'Execute SQL' steps in our scenario run with the correct connection, we looked up every step of the scenario and if it had a connection changed the connection accordingly. Is there any easier way to do this?
- Will there be documentation available on these topics? we could only find the documentation for file based recipes and scenarios.
Thanks for your input!
This is a follow up from @akshaykatre
post: How do you use SQL Script in recipe as application - Dataiku Community @fchataigner2
Will post our (@akshaykatre
) workaround here, hope this may be of help to someone in the community!
Background: Upon running the application-as-recipe, a temporary project instance is created that performs input replacements and the execution of the flow defined in a scenario.
After the scenario has run, the output will be copied back from the instantiated project to the project where the app-as-recipe is called.
What this means for the use of SQL datasets, is that the connection of any dataset in the instantiated project remains the same connection as it was when defining your flow in the project that was turned into the application-as-recipe.
Because it is likely you will use the app-as-recipe on a dataset on a different SQL server, you will have to remap the connections dynamically within the scenario when designing your application.
If you forget to do this, you will get an error that the SQL recipe can't work on multiple connections simultaneously.
Write two scenario's:
# Python script for SettingVariables Scenario import dataiku from dataiku.scenario import Scenario client = dataiku.api_client() # the project we are in now - the temporary application-as-recipe project project = client.get_default_project() # retrieve necessary information from the input dataset via the name of the first recipe first_recipe = project.get_recipe('name_recipe_linked_to_input_dataset') name_input_dataset = first_recipe.get_settings().get_recipe_inputs().get('main').get('items')[0].get('ref') # get parameters via this dataset which has the format ProjectKey.dataset_name # note that ProjectKey can also be obtained via the scenario as 'scenarioTriggerParam_projectKey' def datasetinfo(dataset): dataset = dataiku.Dataset(dataset) connection = dataset.get_config().get('params').get('connection') schema = dataset.get_config().get('params').get('schema') table_name = dataset.get_config().get('params').get('table') projectkey = dataset.get_config()['projectKey'] return connection, schema, table_name, projectkey connection, schema, tablename, projectkey = datasetinfo(name_input_dataset) # set global (or local) variables in temporary application-as-recipe project if variables are used outside this scenario vars = project.get_variables() vars['standard']['connection'] = connection vars['standard']['schema'] = schema # Change scenario connection scenario = project.get_scenario('BuildDatasets') settings = scenario.get_settings() for step in settings.raw_steps: if 'connection' in step['params'].keys(): step['params']['connection'] = connection settings.save() # Change connection and schema of each dataset for datasets in project.list_datasets(): data_table = project.get_dataset(datasets['name']) data_settings = data_table.get_settings() data_settings.get_raw_params()['connection'] = connection data_settings.get_raw_params()['schema'] = schema data_settings.save() # Adjust for names that have variables in them if "$" in tablename: tablename = tablename.replace("$", "") localvars = dataiku.Project(project_key=projectkey).get_variables().get('local') standardvars = dataiku.Project(project_key=projectkey).get_variables().get('standard') standardvars.update(localvars) tablename = tablename.format(**standardvars) # example of a local variable that can be used by SQL scripts vars["local"]["input_table"] = schema + "." + tablename # save the new variables project.set_variables(vars) # Call scenario that builds the actual datasets which now has remapped connections and can make use of variables transferred from the callingProject Scenario().run_scenario('BuildDatasets')
Thank you for sharing your workaround @pvannies
@akshaykatre
!