Retrieve last build date via API

LoicM Registered Posts: 5 ✭✭✭✭


I am looking to retrieve the last time a dataset was built using the API.

This information is readliy available on the website appScreenshot 2020-08-24 at 15.01.03.png

I can even click on the link of the last build to get the exact datetime

Screenshot 2020-08-24 at 15.01.19.png

For my most recent datasets, it is relatively straightforward, I can look into the latest metrics values:

from dataiku import api_clientdataset = api_client.get_project("myproject").get_dataset("mydataset")last_metrics = dataset.get_last_metric_values()last_build_datetime = last_metrics.get_metric_by_id("reporting:BUILD_START_DATE")>>> get a string that has the last build date in UTC

However, on older datasets, this metric is not present, meaning that I will get an:

Exception: Metric reporting:BUILD_START_DATE not found among: ['basic:COUNT_COLUMNS', 'records:COUNT_RECORDS']

As the info is present on the web service for ALL datasets, I assume it is stored somewhere: I am however at a loss on how to get that info from the API for older table.

We made the transition from DSS 5.0 to 7.0 about a year ago, it seems - but I have not 100% certitude here - that these were the table built using DSS 5.0 are the ones that are not retrievable.

Best Answer

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker
    Answer ✓


    BUILD_START_DATE is a magic metric that cannot be "computed" since it is only ever "set" by actually building a dataset.

    You can otherwise obtain the information about last builds of dataset by using the "Internal stats" dataset, in "Objects state" view. This dataset will then contain a line per dataset partition with the last build time. You can then load the dataframe corresponding to this Internal stats dataset in your own Python code, and lookup into it.


  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭

    Hi @LoicM

    This is indeed an interesting question. I imagine what's happening is that those old datasets have not had their metrics recalculated since the upgrade.

    Can you please confirm?

    Thank you

  • LoicM
    LoicM Registered Posts: 5 ✭✭✭✭

    Indeed @Liev
    , in most of the cases these metrics were not calculated since the latest upgrade.

    Following your question, I tried recomputing them, first with the default:

    dataset.compute_metrics() # Will compute metrics setup on the dataset

    Which only recomputes the metrics that are already present.

    I then try to specify the metric that I wanted with the argument metric__ids:

    dataset.compute_metrics(metric_ids=["reporting:BUILD_START_DATE"])# Also tried in addition with all other metrics already presentdefault_metrics = dataset.get_last_metric_values().get_all_ids()dataset.compute_metrics(metric_ids=default_metrics+["reporting:BUILD_START_DATE"])

    In both cases, the build start date was not made available, even though the computation raised no error.

    When computing only for BUILD_START_DATE, only the metric "reporting:METRICS_COMPUTATION_DURATION" was updated, whereas when including the default metrics, these default metrics were also included.

    From what I see in my workspace, rebuilding the dataset may solve the problem of missing metrics, but since I need this metric to actually know wether it makes sense to rebuild them in the first place, this is a bit of a chicken and egg problem

  • LoicM
    LoicM Registered Posts: 5 ✭✭✭✭

    Hey @Clément_Stenac

    Thanks for the advice, this seems to work for me !

    Having to pull the whole dataset may be a bit overkill, but I'll think about refactoring my code to get info on all the datasets on this internal stats dataset

Setup Info
      Help me…