Deleting orphan objects after project deletion
Hi,
I wanted to do some "house cleanning" in a Dataiku instance with a lot of projects. One of our main concerns are datasets/recipes that weren't dropped after deleting old projects.
How is the best way of finding and excluding them?
Thanks in advance!
Operating system used: Linux
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
You can use the following script to delete orphaned datasets within an existing project https://github.com/dataiku/dss-code-samples/tree/master/flow/delete_orphaned_datasets
However, if I understand correctly your concern here are datasets that were not dropped from the databases/cloud storage when the project was deleted.
In that case, the only way to really do this would be to check in the database directly or through cloud storage directly.
For any prefixes /suffixes matching the deleted projectKey.
For SQL tables by default would have the project key in the suffix e.g _${projectKey}
For could e.g S3 for path in bucket ${projectKey}/${odbId}.
You may also be able to leverage the Catalog - Connection Explorer to search for tables.
To avoid this altogether you can choose the below options when deleting a project :
Hope that helps.
Answers
-
Thanks Alex, that's it!
We were searching for both of these solutions, removing flow orphans and left behind datasets! -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
@AlexT
,What is the default on the Drop Data Option?
For a design node that has less experienced users, I'd almost like to have these two by default set to [checked] however, for a production node, I'd prefer to have these unchecked.
Is there a feature to say which is default for a specific instance of DSS?