Last build timestamps for managed folders with the Python API

Options
Jus
Jus Registered Posts: 7

Hello,

Recently I've been working on a way to notify users on the Dataiku platform that some of their datasets haven't been built for a while. The general idea is to give each user / data steward insight on which of their datasets are 'old' so that they can do some cleaning up and save space on our database and the dataiku filesystem.

I am using the last build info to determine when the data was last created. This info is available for both datasets and managed folders through the GUI (Details --> Status tab). However, as far as I know the Python API only allows for datasets to access the last build info (see the properties for the DSSDatasetInfo object). I would like to have similar info for managed folders (accessible with the Python API), so that we can add those to our analysis.

BR,

Justin

1
1 votes

New · Last Updated

Comments

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,727 Neuron
    Options

    Indeed this doesn't seem to be available via the official Python API. And I agree it should be included so I have added my vote. But if you really want it and are willing to get your hands dirty it is available on the private API the GUI uses (see first screen shot). Another way to get it is to process all jobs for the project and see which ones write to the folder (see second screen shot).

    Screenshot 2024-05-17 at 22.03.10.png

    Screenshot 2024-05-17 at 22.08.56.png

  • Jus
    Jus Registered Posts: 7
    Options

    Thanks for the suggestions, I think I'm going to try to use the job object to extract the build date of managed folders. Hopefully Dataiku will still add a function to determine the build date for managed folders though, it makes more sense to loop over all managed folders.

Setup Info
    Tags
      Help me…