Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am trying to get the size of all the datasets in my project using a python code in DSS. I am unable to extract this information. Can anyone help resolve this issue?
Hi joshi123,
If by "size" you mean the number of rows and columns of your datasets, you can do so by retrieving metrics using the Dataset API. Here is an example that builds a list of dictionaries, each list item having the name and (number_of_rows, number_of_columns) as values:
import dataikuapi
client = dataikuapi.DSSClient(host=YOUR_HOST, api_key=YOUR_API_KEY)
project = client.get_project(YOUR_PROJECT_KEY)
dataset_sizes = []
last_val = lambda x: x["lastValues"][0]["value"] if x["lastValues"] else 0
for d in project.list_datasets():
dataset_handle = project.get_dataset(d.name)
dataset_handle.compute_metrics() # (!) Can be costly for large datasets
metrics = dataset_handle.get_last_metric_values()
dataset_sizes.append({"name": d.name,
"size": (last_val(metrics.get_metric_by_id("records:COUNT_RECORDS")),
last_val(metrics.get_metric_by_id("basic:COUNT_COLUMNS")))})
Best,
Harizo