List all connection calls

Camila
Camila Registered Posts: 1

Hi guys,

Is there any way to list all calls that a dataiku connections.

For exemple, i have a connection pg-teste-op and i need to know many times it was used, that is, how many datasets use this connection and the highest possble level of detail, such as wich job used the connection.

Answers

  • Islam
    Islam Dataiku DSS Core Designer, Registered Posts: 8 ✭✭✭✭

    Hello !
    for me, there are two possible ways to trace connection usage.

    1. You can enable logging for connections and queries in PostgreSQL by setting parameters in the postgresql.conf
    2. Using Dataiku's Python API as an admin, you can scan all projects, datasets, and jobs to find where a specific connection is being used.
    import dataiku
    
    project = dataiku.Project()
    
    target_connection = "pg-teste-op"
    
    connection_usage_data = []
    
    for dataset in project.list_datasets():
        dataset_name = dataset["name"]
        connection_name = dataset["params"].get("connection", "")
    
        if connection_name == target_connection:
            connection_usage_data.append({
                "dataset_name": dataset_name,
                "connection_name": connection_name
            })
    

    maybe this can help.

    good night

    Islam

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,248 Neuron
    edited January 16

    This bit of Python API code will give you all datasets for a project:

    import dataiku
    client = dataiku.api_client()
    project = client.get_default_project()
    datasets = project.list_datasets()
    # Returns a list of DSSDatasetListItem
    
    for dataset in datasets:
            # Quick access to main information in the dataset list item
            print("Name: %s" % dataset.name)
            print("Type: %s" % dataset.type)
            print("Connection: %s" % dataset.connection)
            print("Tags: %s" % dataset.tags) # Returns a list of strings
    
            # You can also use the list item as a dict of all available dataset information
            print("Raw: %s" % dataset)
    

    You can modify it to loop through all projects and get all datasets and their connection. With regards to jobs it's not as straight forward. You can use project.list_jobs() but you will need to parse the results to determine which datasets were built by the job. But even with this logic this may not indicate all connections done as people can use dashboards, SQL Notebooks etc. So for a full audit of the use of a connection I think you will need to look at the backend logs and capture the events you think are relevant.

Setup Info
    Tags
      Help me…