I am looking to retrieve the last time a dataset was built using the API.
This information is readliy available on the website app
I can even click on the link of the last build to get the exact datetime
For my most recent datasets, it is relatively straightforward, I can look into the latest metrics values:
from dataiku import api_client dataset = api_client.get_project("myproject").get_dataset("mydataset") last_metrics = dataset.get_last_metric_values() last_build_datetime = last_metrics.get_metric_by_id("reporting:BUILD_START_DATE") >>> get a string that has the last build date in UTC
However, on older datasets, this metric is not present, meaning that I will get an:
Exception: Metric reporting:BUILD_START_DATE not found among: ['basic:COUNT_COLUMNS', 'records:COUNT_RECORDS']
As the info is present on the web service for ALL datasets, I assume it is stored somewhere: I am however at a loss on how to get that info from the API for older table.
We made the transition from DSS 5.0 to 7.0 about a year ago, it seems - but I have not 100% certitude here - that these were the table built using DSS 5.0 are the ones that are not retrievable.
Indeed @Liev, in most of the cases these metrics were not calculated since the latest upgrade.
Following your question, I tried recomputing them, first with the default:
dataset.compute_metrics() # Will compute metrics setup on the dataset
Which only recomputes the metrics that are already present.
I then try to specify the metric that I wanted with the argument metric__ids:
dataset.compute_metrics(metric_ids=["reporting:BUILD_START_DATE"]) # Also tried in addition with all other metrics already present default_metrics = dataset.get_last_metric_values().get_all_ids() dataset.compute_metrics( metric_ids=default_metrics+["reporting:BUILD_START_DATE"])
In both cases, the build start date was not made available, even though the computation raised no error.
When computing only for BUILD_START_DATE, only the metric "reporting:METRICS_COMPUTATION_DURATION" was updated, whereas when including the default metrics, these default metrics were also included.
From what I see in my workspace, rebuilding the dataset may solve the problem of missing metrics, but since I need this metric to actually know wether it makes sense to rebuild them in the first place, this is a bit of a chicken and egg problem
BUILD_START_DATE is a magic metric that cannot be "computed" since it is only ever "set" by actually building a dataset.
You can otherwise obtain the information about last builds of dataset by using the "Internal stats" dataset, in "Objects state" view. This dataset will then contain a line per dataset partition with the last build time. You can then load the dataframe corresponding to this Internal stats dataset in your own Python code, and lookup into it.
Thanks for the advice, this seems to work for me !
Having to pull the whole dataset may be a bit overkill, but I'll think about refactoring my code to get info on all the datasets on this internal stats dataset