List all connection calls

Hi guys,
Is there any way to list all calls that a dataiku connections.
For exemple, i have a connection pg-teste-op and i need to know many times it was used, that is, how many datasets use this connection and the highest possble level of detail, such as wich job used the connection.
Answers
-
Hello !
for me, there are two possible ways to trace connection usage.- You can enable logging for connections and queries in PostgreSQL by setting parameters in the
postgresql.conf
- Using Dataiku's Python API as an admin, you can scan all projects, datasets, and jobs to find where a specific connection is being used.
import dataiku project = dataiku.Project() target_connection = "pg-teste-op" connection_usage_data = [] for dataset in project.list_datasets(): dataset_name = dataset["name"] connection_name = dataset["params"].get("connection", "") if connection_name == target_connection: connection_usage_data.append({ "dataset_name": dataset_name, "connection_name": connection_name })
maybe this can help.
good night
Islam
- You can enable logging for connections and queries in PostgreSQL by setting parameters in the
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,248 Neuron
This bit of Python API code will give you all datasets for a project:
import dataiku client = dataiku.api_client() project = client.get_default_project() datasets = project.list_datasets() # Returns a list of DSSDatasetListItem for dataset in datasets: # Quick access to main information in the dataset list item print("Name: %s" % dataset.name) print("Type: %s" % dataset.type) print("Connection: %s" % dataset.connection) print("Tags: %s" % dataset.tags) # Returns a list of strings # You can also use the list item as a dict of all available dataset information print("Raw: %s" % dataset)
You can modify it to loop through all projects and get all datasets and their connection. With regards to jobs it's not as straight forward. You can use project.list_jobs() but you will need to parse the results to determine which datasets were built by the job. But even with this logic this may not indicate all connections done as people can use dashboards, SQL Notebooks etc. So for a full audit of the use of a connection I think you will need to look at the backend logs and capture the events you think are relevant.