Deleting orphan objects after project deletion

Options
Samuel_Dias
Samuel_Dias Registered Posts: 2

Hi,

I wanted to do some "house cleanning" in a Dataiku instance with a lot of projects. One of our main concerns are datasets/recipes that weren't dropped after deleting old projects.

How is the best way of finding and excluding them?

Thanks in advance!


Operating system used: Linux

Tagged:

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Answer ✓
    Options

    Hi,

    You can use the following script to delete orphaned datasets within an existing project https://github.com/dataiku/dss-code-samples/tree/master/flow/delete_orphaned_datasets

    However, if I understand correctly your concern here are datasets that were not dropped from the databases/cloud storage when the project was deleted.

    In that case, the only way to really do this would be to check in the database directly or through cloud storage directly.

    For any prefixes /suffixes matching the deleted projectKey.

    For SQL tables by default would have the project key in the suffix e.g _${projectKey}

    For could e.g S3 for path in bucket ${projectKey}/${odbId}.

    You may also be able to leverage the Catalog - Connection Explorer to search for tables.

    To avoid this altogether you can choose the below options when deleting a project :

    Screenshot 2022-10-17 at 15.04.48.png

    Hope that helps.

Answers

  • Samuel_Dias
    Samuel_Dias Registered Posts: 2
    Options

    Thanks Alex, that's it!
    We were searching for both of these solutions, removing flow orphans and left behind datasets!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexT
    ,

    What is the default on the Drop Data Option?

    For a design node that has less experienced users, I'd almost like to have these two by default set to [checked] however, for a production node, I'd prefer to have these unchecked.

    Is there a feature to say which is default for a specific instance of DSS?

Setup Info
    Tags
      Help me…