How to correctly duplicate a project

Erlebacher
Level 4
How to correctly duplicate a project

I have a Dataiku project whose uploaded csv files are contained in a folder, from which datasets are created downstream, and then the flow complexifies. I want to duplicate the project, so I choose the option: 

         Uploaded files
         Duplicate data of uploaded
         datasets and managed folders.

I assumed that my csv files would be included since I consider them as "Uploaded files." However, they were not uploaded, and thus, I do not understand the option above. I also assumed that the schemas of the various datasets would be uploaded without the data, and that is fine. 

The second advanced option: 

    Required inputsd
    Duplicate data of required
    (uploaded and input) datasets
    and managed folders.

worked as I expected. The csv files in the input folder were duplicated, as were the datasets connected to these input files. But non of the other 15 datasets were duplicated since I can recreate them from the original dataset. 

So I would like to understand better the first option, which would the least memory costly. Thanks.

    Gordon


Operating system used: mac ventura

1 Reply
MiguelangelC
Dataiker

Hello,

The upload option refers to the 'Upload your files' +Dataset option. All the other datasets and managed folders data will not be duplicated using the first option.

Note that when files in a folder are not uploaded, but added, which provides justification for this behaviour.

0 Kudos