Script to list all connections used in a project

Solved!
sylvyr3
Level 3
Script to list all connections used in a project

Anyone have a script that lists all the connections used within a project?  

0 Kudos
2 Solutions
sergeyd
Dataiker

Hi @sylvyr3 

You will need to go through all the datasets and retrieve their connections: 

 

import dataiku

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())

all_datasets = project.list_datasets()
for dataset in all_datasets:
    try:
        print('Dataset', dataset["name"], 'is using', dataset["params"]["connection"], "connection")
    except KeyError:
        print('Dataset', dataset["name"], 'is using unlisted connection. Possibly upload dataset') 

 

View solution in original post

0 Kudos
sergeyd
Dataiker

Hi @sylvyr3 

You should be able to do this in the same loop: 

import dataiku

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
all_datasets = project.list_datasets()

for d in all_datasets:
    dataset = project.get_dataset(d["name"])
    definition = dataset.get_definition()

    # Getting only datasets that have a connection
    try:
        connection = definition["params"]["connection"]
    except KeyError:
        connection = "unlisted connection"

    # Check the connection to see if it's the one we need to update
    # If it is, modify the dataset connection to map to the new one
    if connection == "OLD_CONNECTION":
        definition["params"]["connection"] = "NEW_CONNECTION"
        dataset.set_definition(definition)

View solution in original post

0 Kudos
3 Replies
sergeyd
Dataiker

Hi @sylvyr3 

You will need to go through all the datasets and retrieve their connections: 

 

import dataiku

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())

all_datasets = project.list_datasets()
for dataset in all_datasets:
    try:
        print('Dataset', dataset["name"], 'is using', dataset["params"]["connection"], "connection")
    except KeyError:
        print('Dataset', dataset["name"], 'is using unlisted connection. Possibly upload dataset') 

 

0 Kudos
sylvyr3
Level 3
Author

Thank you, this was helpful.

I should have asked in the original post, but would it be possible to programmatically change references from connection A to connection B?  

I need to create a new connection profile and switch some of the connections to this new profile.  I'd rather write a script to do this so I don't miss any references.

0 Kudos
sergeyd
Dataiker

Hi @sylvyr3 

You should be able to do this in the same loop: 

import dataiku

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
all_datasets = project.list_datasets()

for d in all_datasets:
    dataset = project.get_dataset(d["name"])
    definition = dataset.get_definition()

    # Getting only datasets that have a connection
    try:
        connection = definition["params"]["connection"]
    except KeyError:
        connection = "unlisted connection"

    # Check the connection to see if it's the one we need to update
    # If it is, modify the dataset connection to map to the new one
    if connection == "OLD_CONNECTION":
        definition["params"]["connection"] = "NEW_CONNECTION"
        dataset.set_definition(definition)
0 Kudos