Changing Dataset connection in Python recipe has to be run twice

LucasT Registered Posts: 2
edited July 16 in Using Dataiku

Hi All,

I have a Python recipe that outputs a SQL Dataset. I wish to control to which SQL Database the output Dataset is written to. I have been able to do it with the following code:

import dataiku
import pandas as pd

#Change output DataBase for "MyDataset"
client = dataiku.api_client()
project = client.get_default_project()
settings = project.get_dataset('MyDataset').get_settings()
settings.get_raw_params()['connection'] = 'MyDB2'

#Define Dataset
MyDataset = dataiku.Dataset('MyDataset')

#Create test Dataframe
test_df = pd.DataFrame({'Col1':[1,2,3],'Col2':['A','B','C']})

#Set schema
    {'name':'Col1', 'type':'bigint'},
    {'name':'Col2', 'type':'string', 'maxLength':1}

#Write Output

The code above works fine in the notebook but when I run the recipe, it writes the table on the wrong database (writes to original connection before the settings change). If I run the recipe twice, it then writes the table on the correct database/connection.

More generally, every time I change the database connection setting within the code it has to be run twice for the recipe to write to the correct database. Has anyone encountered this issue before?



Best Answer

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    Answer ✓

    Hi @LucasT

    Changing the connection of a dataset within a recipe isn't recommended. Instead, I recommend creating a separate recipe and dataset for each connection that you want to use.

    If you need to change the connection, as a workaround, you could create a scenario that changes the connection, then builds the dataset as 2 separate steps.




Setup Info
      Help me…