Deleting orphan objects after project deletion

Solved!
Samuel_Dias
Level 1
Deleting orphan objects after project deletion

Hi, 

I wanted to do some "house cleanning" in a Dataiku instance with a lot of projects. One of our main concerns are datasets/recipes that weren't dropped after deleting old projects.

How is the best way of finding and excluding them?

Thanks in advance!


Operating system used: Linux

0 Kudos
1 Solution
AlexT
Dataiker

Hi,

You can use the following script to delete orphaned datasets within an existing project  https://github.com/dataiku/dss-code-samples/tree/master/flow/delete_orphaned_datasets

However, if I understand correctly your concern here are datasets that were not dropped from the databases/cloud storage when the project was deleted. 

In that case, the only way to really do this would be to check in the database directly or through cloud storage directly. 

 For any prefixes /suffixes matching the deleted projectKey. 

For SQL tables by default would have the project key in the suffix e.g _${projectKey}

For could e.g S3 for path in bucket ${projectKey}/${odbId}.

You may also be able to leverage the Catalog - Connection Explorer to search for tables. 

To avoid this altogether you can choose the below options  when deleting a project : 

Screenshot 2022-10-17 at 15.04.48.png

Hope that helps.

View solution in original post

0 Kudos
3 Replies
AlexT
Dataiker

Hi,

You can use the following script to delete orphaned datasets within an existing project  https://github.com/dataiku/dss-code-samples/tree/master/flow/delete_orphaned_datasets

However, if I understand correctly your concern here are datasets that were not dropped from the databases/cloud storage when the project was deleted. 

In that case, the only way to really do this would be to check in the database directly or through cloud storage directly. 

 For any prefixes /suffixes matching the deleted projectKey. 

For SQL tables by default would have the project key in the suffix e.g _${projectKey}

For could e.g S3 for path in bucket ${projectKey}/${odbId}.

You may also be able to leverage the Catalog - Connection Explorer to search for tables. 

To avoid this altogether you can choose the below options  when deleting a project : 

Screenshot 2022-10-17 at 15.04.48.png

Hope that helps.

0 Kudos
Samuel_Dias
Level 1
Author

Thanks Alex, that's it! 
We were searching for both of these solutions, removing flow orphans and left behind datasets! 

0 Kudos
tgb417

@AlexT ,

What is the default on the Drop Data Option?

For a design node that has less experienced users, I'd almost like to have these two by default set to [checked] however, for a production node, I'd prefer to have these unchecked.

Is there a feature to say which is default for a specific instance of DSS?

--Tom
0 Kudos