Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I wanted to do some "house cleanning" in a Dataiku instance with a lot of projects. One of our main concerns are datasets/recipes that weren't dropped after deleting old projects.
How is the best way of finding and excluding them?
Thanks in advance!
Operating system used: Linux
Hi,
You can use the following script to delete orphaned datasets within an existing project https://github.com/dataiku/dss-code-samples/tree/master/flow/delete_orphaned_datasets
However, if I understand correctly your concern here are datasets that were not dropped from the databases/cloud storage when the project was deleted.
In that case, the only way to really do this would be to check in the database directly or through cloud storage directly.
For any prefixes /suffixes matching the deleted projectKey.
For SQL tables by default would have the project key in the suffix e.g _${projectKey}
For could e.g S3 for path in bucket ${projectKey}/${odbId}.
You may also be able to leverage the Catalog - Connection Explorer to search for tables.
To avoid this altogether you can choose the below options when deleting a project :
Hope that helps.
Hi,
You can use the following script to delete orphaned datasets within an existing project https://github.com/dataiku/dss-code-samples/tree/master/flow/delete_orphaned_datasets
However, if I understand correctly your concern here are datasets that were not dropped from the databases/cloud storage when the project was deleted.
In that case, the only way to really do this would be to check in the database directly or through cloud storage directly.
For any prefixes /suffixes matching the deleted projectKey.
For SQL tables by default would have the project key in the suffix e.g _${projectKey}
For could e.g S3 for path in bucket ${projectKey}/${odbId}.
You may also be able to leverage the Catalog - Connection Explorer to search for tables.
To avoid this altogether you can choose the below options when deleting a project :
Hope that helps.
Thanks Alex, that's it!
We were searching for both of these solutions, removing flow orphans and left behind datasets!
@AlexT ,
What is the default on the Drop Data Option?
For a design node that has less experienced users, I'd almost like to have these two by default set to [checked] however, for a production node, I'd prefer to have these unchecked.
Is there a feature to say which is default for a specific instance of DSS?