Changing Dataset connection in Python recipe has to be run twice
Hi All,
I have a Python recipe that outputs a SQL Dataset. I wish to control to which SQL Database the output Dataset is written to. I have been able to do it with the following code:
import dataiku import pandas as pd #Change output DataBase for "MyDataset" client = dataiku.api_client() project = client.get_default_project() settings = project.get_dataset('MyDataset').get_settings() settings.get_raw_params()['connection'] = 'MyDB2' settings.save() #Define Dataset MyDataset = dataiku.Dataset('MyDataset') #Create test Dataframe test_df = pd.DataFrame({'Col1':[1,2,3],'Col2':['A','B','C']}) #Set schema MyDataset.write_schema([ {'name':'Col1', 'type':'bigint'}, {'name':'Col2', 'type':'string', 'maxLength':1} ]) #Write Output MyDataset.write_dataframe(test_df)
The code above works fine in the notebook but when I run the recipe, it writes the table on the wrong database (writes to original connection before the settings change). If I run the recipe twice, it then writes the table on the correct database/connection.
More generally, every time I change the database connection setting within the code it has to be run twice for the recipe to write to the correct database. Has anyone encountered this issue before?
Thanks
Best Answer
-
Hi @LucasT
,Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.
If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.
Thanks,
Zach
Answers
-
Thanks @ZachM
,Adding the connection change as a separate step of a scenario works