How to quickly mass-change the database type of datasets for a project?

Haoran
Haoran Registered Posts: 8 ✭✭✭
edited October 30 in Using Dataiku
image.png

Thanks for your time firstly.

I am currently changing the type of database from 'Snowflake' to 'Redshift' (except for source node). While I could use 'Change connection' function at the bottom-right border of DSS, but it's quite robotic. Is there any efficient approach or python API that could modify the connection in a short time?

def mass_change_connection(project, orig_conn, dest_conn):
"""Mass change dataset connections in a project (filesystem connections only)"""
for dataset in project.list_datasets(as_type='objects'):
ds_settings = dataset.get_settings()
if ds_settings.type == 'Snowflake':
ds_settings
params = ds_settings.get_raw().get('params')
print(params)
current_connection = params.get('connection')
if current_connection == orig_conn:
params['connection'] = dest_conn
ds_settings.save()

I found the above function from previous Q&A. However it only change the name of the connection but not the type

of the database.

Thanks for your help.

Operating system used: Windows11

Best Answers

  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 166 Dataiker
    Answer ✓

    Hi @Haoran ,

    You might be able to use the 'Connections' Flow view to do this. Here's how:

    • On your Flow, click on 'Apply a view'. It's on the top left corner.
    • Select Connections: this is going to color each dataset in your Flow according to the connection that it uses
      • You can further filter the connections by clicking on any of the connection facets
    • 'Select items': this is going to select all the datasets based on the connections highlighted in the view
    • Right-click > Change Connection and move the datasets to a different connection. You'll need to rebuild them.
    image.png

    image.png

    Cheers,

    Ashley

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,617 Neuron
    Answer ✓

    Changing the type can also be done although it requires more work since different connection types may have different parameters. Type can be changed here:

    ds_settings.get_raw()['type']
    

    The best way to do this is to look at two datasets created by Dataiku in the source and destination connections. Then compare the get_raw() output of each of them and figure out all the things you need to change. Then code and test and you should be good. Note each connection type will have different properties so the code will be source/destination specific.

  • Haoran
    Haoran Registered Posts: 8 ✭✭✭
    Answer ✓
  • Haoran
    Haoran Registered Posts: 8 ✭✭✭
    Answer ✓

    Here is my final solution code:

    import dataiku
    import urllib3
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

    def get_project(client, project_key):
    return client.get_project(project_key)

    def get_datasets(client, project_key):
    project = get_project(client, project_key)
    return project.list_datasets()

    def change_connection(project, orig_conn, dest_conn):
    i = 0
    for dataset in project.list_datasets(as_type='objects'):
    ds_settings = dataset.get_settings()
    raw = ds_settings.get_raw()
    if raw.get("type") == "Snowflake" and raw.get("managed") == True:
    params = raw.get('params')
    current_connection = params.get('connection')
    if current_connection == orig_conn:
    raw['type'] = 'Redshift'
    params['connection'] = dest_conn
    ds_settings.save()
    i += 1
    print(f"Changed connections: {i}")

    if __name__ == "__main__":

    # DSS
    url = "xxx"
    api_key = "xxx"

    # connections
    orig_conn = "CONNECTION_ORIG"
    dest_conn = "CONNECTION_DEST"

    dataiku.set_remote_dss(url, api_key, no_check_certificate=True)
    client = dataiku.api_client()


    # project key
    project_key = "HELLOWORLD"
    print("project_key: " + project_key)

    # project
    project = client.get_project(project_key) # function call


    change_connection(project, orig_conn, dest_conn)

Answers

Setup Info
    Tags
      Help me…