Changing Dataset connection in Python recipe has to be run twice

LucasT
LucasT Registered Posts: 2
edited July 2024 in Using Dataiku

Hi All,

I have a Python recipe that outputs a SQL Dataset. I wish to control to which SQL Database the output Dataset is written to. I have been able to do it with the following code:

import dataiku
import pandas as pd

#Change output DataBase for "MyDataset"
client = dataiku.api_client()
project = client.get_default_project()
settings = project.get_dataset('MyDataset').get_settings()
settings.get_raw_params()['connection'] = 'MyDB2'
settings.save()

#Define Dataset
MyDataset = dataiku.Dataset('MyDataset')

#Create test Dataframe
test_df = pd.DataFrame({'Col1':[1,2,3],'Col2':['A','B','C']})

#Set schema
MyDataset.write_schema([
    {'name':'Col1', 'type':'bigint'},
    {'name':'Col2', 'type':'string', 'maxLength':1}
])

#Write Output
MyDataset.write_dataframe(test_df)

The code above works fine in the notebook but when I run the recipe, it writes the table on the wrong database (writes to original connection before the settings change). If I run the recipe twice, it then writes the table on the correct database/connection.

More generally, every time I change the database connection setting within the code it has to be run twice for the recipe to write to the correct database. Has anyone encountered this issue before?

Thanks

Tagged:

Best Answer

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    Answer ✓

    Hi @LucasT
    ,

    Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.

    If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.

    Thanks,

    Zach

Answers

Setup Info
    Tags
      Help me…