Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

API to get dataset size in DSS

joshi123
Level 1
API to get dataset size in DSS

I am trying to get the size of all the datasets in my project using a python code in DSS. I am unable to extract this information. Can anyone help resolve this issue?

0 Kudos
1 Reply
HarizoR
Developer Advocate
Developer Advocate

Hi joshi123,

If by "size" you mean the number of rows and columns of your datasets, you can do so by retrieving  metrics using the Dataset API. Here is an example that builds a list of dictionaries, each list item having the name and (number_of_rows, number_of_columns) as values:

import dataikuapi
client = dataikuapi.DSSClient(host=YOUR_HOST, api_key=YOUR_API_KEY)
project = client.get_project(YOUR_PROJECT_KEY)

dataset_sizes = []
last_val = lambda x: x["lastValues"][0]["value"] if x["lastValues"] else 0
for d in project.list_datasets():
    dataset_handle = project.get_dataset(d.name)
    dataset_handle.compute_metrics()     # (!) Can be costly for large datasets
    metrics = dataset_handle.get_last_metric_values()
    dataset_sizes.append({"name": d.name,
                          "size": (last_val(metrics.get_metric_by_id("records:COUNT_RECORDS")),
                                   last_val(metrics.get_metric_by_id("basic:COUNT_COLUMNS")))})

 

Best,

Harizo

 

0 Kudos