How to quickly mass-change the database type of datasets for a project?

Haoran
Haoran Registered Posts: 1
edited 8:54AM in Using Dataiku
image.png

Thanks for your time firstly.

I am currently changing the type of database from 'Snowflake' to 'Redshift' (except for source node). While I could use 'Change connection' function at the bottom-right border of DSS, but it's quite robotic. Is there any efficient approach or python API that could modify the connection in a short time?

def mass_change_connection(project, orig_conn, dest_conn):
"""Mass change dataset connections in a project (filesystem connections only)"""
for dataset in project.list_datasets(as_type='objects'):
ds_settings = dataset.get_settings()
if ds_settings.type == 'Snowflake':
ds_settings
params = ds_settings.get_raw().get('params')
print(params)
current_connection = params.get('connection')
if current_connection == orig_conn:
params['connection'] = dest_conn
ds_settings.save()

I found the above function from previous Q&A. However it only change the name of the connection but not the type

of the database.

Thanks for your help.

Operating system used: Windows11

Answers

  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 166 Dataiker

    Hi @Haoran ,

    You might be able to use the 'Connections' Flow view to do this. Here's how:

    • On your Flow, click on 'Apply a view'. It's on the top left corner.
    • Select Connections: this is going to color each dataset in your Flow according to the connection that it uses
      • You can further filter the connections by clicking on any of the connection facets
    • 'Select items': this is going to select all the datasets based on the connections highlighted in the view
    • Right-click > Change Connection and move the datasets to a different connection. You'll need to rebuild them.
    image.png

    image.png

    Cheers,

    Ashley

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,595 Neuron

    Changing the type can also be done although it requires more work since different connection types may have different parameters. Type can be changed here:

    ds_settings.get_raw()['type']
    

    The best way to do this is to look at two datasets created by Dataiku in the source and destination connections. Then compare the get_raw() output of each of them and figure out all the things you need to change. Then code and test and you should be good. Note each connection type will have different properties so the code will be source/destination specific.

Setup Info
    Tags
      Help me…