Last build timestamps for managed folders with the Python API
Hello,
Recently I've been working on a way to notify users on the Dataiku platform that some of their datasets haven't been built for a while. The general idea is to give each user / data steward insight on which of their datasets are 'old' so that they can do some cleaning up and save space on our database and the dataiku filesystem.
I am using the last build info to determine when the data was last created. This info is available for both datasets and managed folders through the GUI (Details --> Status tab). However, as far as I know the Python API only allows for datasets to access the last build info (see the properties for the DSSDatasetInfo object). I would like to have similar info for managed folders (accessible with the Python API), so that we can add those to our analysis.
BR,
Justin
Comments
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,090 Neuron
Indeed this doesn't seem to be available via the official Python API. And I agree it should be included so I have added my vote. But if you really want it and are willing to get your hands dirty it is available on the private API the GUI uses (see first screen shot). Another way to get it is to process all jobs for the project and see which ones write to the folder (see second screen shot).
-
Thanks for the suggestions, I think I'm going to try to use the job object to extract the build date of managed folders. Hopefully Dataiku will still add a function to determine the build date for managed folders though, it makes more sense to loop over all managed folders.