How to correctly duplicate a project

Options
Erlebacher
Erlebacher Registered Posts: 82 ✭✭

I have a Dataiku project whose uploaded csv files are contained in a folder, from which datasets are created downstream, and then the flow complexifies. I want to duplicate the project, so I choose the option:

Uploaded files
Duplicate data of uploaded
datasets and managed folders.

I assumed that my csv files would be included since I consider them as "Uploaded files." However, they were not uploaded, and thus, I do not understand the option above. I also assumed that the schemas of the various datasets would be uploaded without the data, and that is fine.

The second advanced option:

Required inputsd
Duplicate data of required
(uploaded and input) datasets
and managed folders.

worked as I expected. The csv files in the input folder were duplicated, as were the datasets connected to these input files. But non of the other 15 datasets were duplicated since I can recreate them from the original dataset.

So I would like to understand better the first option, which would the least memory costly. Thanks.

Gordon


Operating system used: mac ventura

Tagged:

Answers

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    Options

    Hello,

    The upload option refers to the 'Upload your files' +Dataset option. All the other datasets and managed folders data will not be duplicated using the first option.

    Note that when files in a folder are not uploaded, but added, which provides justification for this behaviour.

Setup Info
    Tags
      Help me…