Community Conundrum 25: Feature Visualization is now live! Read More

Retrieve last build date via API

Level 2
Retrieve last build date via API

Hello,

I am looking to retrieve the last time a dataset was built using the API.

This information is readliy available on the website appScreenshot 2020-08-24 at 15.01.03.png

I can even click on the link of the last build to get the exact datetime

Screenshot 2020-08-24 at 15.01.19.png

 

For my most recent datasets, it is relatively straightforward, I can look into the latest metrics values:

 

from dataiku import api_client

dataset = api_client.get_project("myproject").get_dataset("mydataset")
last_metrics = dataset.get_last_metric_values()
last_build_datetime = last_metrics.get_metric_by_id("reporting:BUILD_START_DATE")
>>> get a string that has the last build date in UTC

 

However, on older datasets, this metric is not present, meaning that I will get an:

Exception:  Metric reporting:BUILD_START_DATE not found among: ['basic:COUNT_COLUMNS', 'records:COUNT_RECORDS']

 

As the info is present on the web service for ALL datasets, I assume it is stored somewhere: I am however at a loss on how to get that info from the API for older table.

 

We made the transition from DSS 5.0 to 7.0 about a year ago, it seems - but I have not 100% certitude here - that these were the table built using DSS 5.0 are the ones that are not retrievable.

 

0 Kudos
4 Replies
Dataiker
Dataiker

Hi @LoicM 

This is indeed an interesting question. I imagine what's happening is that those old datasets have not had their metrics recalculated since the upgrade. 

Can you please confirm?

Thank you

0 Kudos
Level 2
Author

Indeed @Liev, in most of the cases these metrics were not calculated since the latest upgrade.

Following your question, I tried recomputing them, first with the default:

dataset.compute_metrics() # Will compute metrics setup on the dataset

 Which only recomputes the metrics that are already present.

I then try to specify the metric that I wanted with the argument metric__ids:

dataset.compute_metrics(metric_ids=["reporting:BUILD_START_DATE"])
# Also tried in addition with all other metrics already present
default_metrics = dataset.get_last_metric_values().get_all_ids()
dataset.compute_metrics(
    metric_ids=default_metrics+["reporting:BUILD_START_DATE"])

 

In both cases, the build start date was not made available, even though the computation raised no error.

When computing only for BUILD_START_DATE, only the metric "reporting:METRICS_COMPUTATION_DURATION" was updated, whereas when including the default metrics, these default metrics were also included.

 

From what I see in my workspace, rebuilding the dataset may solve the problem of missing metrics, but since I need this metric to actually know wether it makes sense to rebuild them in the first place, this is a bit of a chicken and egg problem

0 Kudos
Dataiker
Dataiker

Hi,

BUILD_START_DATE is a magic metric that cannot be "computed" since it is only ever "set" by actually building a dataset.

You can otherwise obtain the information about last builds of dataset by using the "Internal stats" dataset, in "Objects state" view. This dataset will then contain a line per dataset partition with the last build time. You can then load the dataframe corresponding to this Internal stats dataset in your own Python code, and lookup into it.

Level 2
Author

Hey @Clément_Stenac

Thanks for the advice, this seems to work for me !

Having to pull the whole dataset may be a bit overkill, but I'll think about refactoring my code to get info on all the datasets on this internal stats dataset

Labels (2)