API to get dataset size in DSS
joshi123
Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭
I am trying to get the size of all the datasets in my project using a python code in DSS. I am unable to extract this information. Can anyone help resolve this issue?
Answers
-
Hi joshi123,
If by "size" you mean the number of rows and columns of your datasets, you can do so by retrieving metrics using the Dataset API. Here is an example that builds a list of dictionaries, each list item having the name and (number_of_rows, number_of_columns) as values:
import dataikuapi client = dataikuapi.DSSClient(host=YOUR_HOST, api_key=YOUR_API_KEY) project = client.get_project(YOUR_PROJECT_KEY) dataset_sizes = [] last_val = lambda x: x["lastValues"][0]["value"] if x["lastValues"] else 0 for d in project.list_datasets(): dataset_handle = project.get_dataset(d.name) dataset_handle.compute_metrics() # (!) Can be costly for large datasets metrics = dataset_handle.get_last_metric_values() dataset_sizes.append({"name": d.name, "size": (last_val(metrics.get_metric_by_id("records:COUNT_RECORDS")), last_val(metrics.get_metric_by_id("basic:COUNT_COLUMNS")))})
Best,
Harizo