How to correctly duplicate a project

Erlebacher · December 2022

I have a Dataiku project whose uploaded csv files are contained in a folder, from which datasets are created downstream, and then the flow complexifies. I want to duplicate the project, so I choose the option:

Uploaded files
Duplicate data of uploaded
datasets and managed folders.

I assumed that my csv files would be included since I consider them as "Uploaded files." However, they were not uploaded, and thus, I do not understand the option above. I also assumed that the schemas of the various datasets would be uploaded without the data, and that is fine.

The second advanced option:

Required inputsd
Duplicate data of required
(uploaded and input) datasets
and managed folders.

worked as I expected. The csv files in the input folder were duplicated, as were the datasets connected to these input files. But non of the other 15 datasets were duplicated since I can recreate them from the original dataset.

So I would like to understand better the first option, which would the least memory costly. Thanks.

Gordon

Operating system used: mac ventura

Miguel Angel · December 2022

Hello,

The upload option refers to the 'Upload your files' +Dataset option. All the other datasets and managed folders data will not be duplicated using the first option.

Note that when files in a folder are not uploaded, but added, which provides justification for this behaviour.

How to correctly duplicate a project

Answers

Categories

Setup Info

Tags