Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Changing Dataset connection in Python recipe has to be run twice

Solved!
LucasT
Level 1
Changing Dataset connection in Python recipe has to be run twice

Hi All,

I have a Python recipe that outputs a SQL Dataset. I wish to control to which SQL Database the output Dataset is written to. I have been able to do it with the following code:

import dataiku
import pandas as pd

#Change output DataBase for "MyDataset"
client = dataiku.api_client()
project = client.get_default_project()
settings = project.get_dataset('MyDataset').get_settings()
settings.get_raw_params()['connection'] = 'MyDB2'
settings.save()

#Define Dataset
MyDataset = dataiku.Dataset('MyDataset')

#Create test Dataframe
test_df = pd.DataFrame({'Col1':[1,2,3],'Col2':['A','B','C']})

#Set schema
MyDataset.write_schema([
    {'name':'Col1', 'type':'bigint'},
    {'name':'Col2', 'type':'string', 'maxLength':1}
])

#Write Output
MyDataset.write_dataframe(test_df)

 

The code above works fine in the notebook but when I run the recipe, it writes the table on the wrong database (writes to original connection before the settings change). If I run the recipe twice, it then writes the table on the correct database/connection.

More generally, every time I change the database connection setting within the code it has to be run twice for the recipe to write to the correct database. Has anyone encountered this issue before?

 

Thanks

1 Solution
ZachM
Dataiker

Hi @LucasT,

Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.

If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.

Thanks,

Zach

View solution in original post

2 Replies
ZachM
Dataiker

Hi @LucasT,

Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.

If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.

Thanks,

Zach

LucasT
Level 1
Author

Thanks @ZachM,

Adding the connection change as a separate step of a scenario works