Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on January 17, 2023 5:05PM
Likes: 1
Replies: 2
Hi All,
I have a Python recipe that outputs a SQL Dataset. I wish to control to which SQL Database the output Dataset is written to. I have been able to do it with the following code:
import dataiku import pandas as pd #Change output DataBase for "MyDataset" client = dataiku.api_client() project = client.get_default_project() settings = project.get_dataset('MyDataset').get_settings() settings.get_raw_params()['connection'] = 'MyDB2' settings.save() #Define Dataset MyDataset = dataiku.Dataset('MyDataset') #Create test Dataframe test_df = pd.DataFrame({'Col1':[1,2,3],'Col2':['A','B','C']}) #Set schema MyDataset.write_schema([ {'name':'Col1', 'type':'bigint'}, {'name':'Col2', 'type':'string', 'maxLength':1} ]) #Write Output MyDataset.write_dataframe(test_df)
The code above works fine in the notebook but when I run the recipe, it writes the table on the wrong database (writes to original connection before the settings change). If I run the recipe twice, it then writes the table on the correct database/connection.
More generally, every time I change the database connection setting within the code it has to be run twice for the recipe to write to the correct database. Has anyone encountered this issue before?
Thanks
Hi @LucasT
,
Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.
If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.
Thanks,
Zach
Thanks @ZachM
,
Adding the connection change as a separate step of a scenario works