Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Changing Dataset connection in Python recipe has to be run twice

LucasT
Level 1
Changing Dataset connection in Python recipe has to be run twice

Hi All,

I have a Python recipe that outputs a SQL Dataset. I wish to control to which SQL Database the output Dataset is written to. I have been able to do it with the following code:

import dataiku
import pandas as pd

#Change output DataBase for "MyDataset"
client = dataiku.api_client()
project = client.get_default_project()
settings = project.get_dataset('MyDataset').get_settings()
settings.get_raw_params()['connection'] = 'MyDB2'
settings.save()

#Define Dataset
MyDataset = dataiku.Dataset('MyDataset')

#Create test Dataframe
test_df = pd.DataFrame({'Col1':[1,2,3],'Col2':['A','B','C']})

#Set schema
MyDataset.write_schema([
    {'name':'Col1', 'type':'bigint'},
    {'name':'Col2', 'type':'string', 'maxLength':1}
])

#Write Output
MyDataset.write_dataframe(test_df)

 

The code above works fine in the notebook but when I run the recipe, it writes the table on the wrong database (writes to original connection before the settings change). If I run the recipe twice, it then writes the table on the correct database/connection.

More generally, every time I change the database connection setting within the code it has to be run twice for the recipe to write to the correct database. Has anyone encountered this issue before?

 

Thanks

0 Kudos
1 Reply
ZachM
Dataiker

Hi @LucasT,

Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.

If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.

Thanks,

Zach

0 Kudos