I have a Dataiku project whose uploaded csv files are contained in a folder, from which datasets are created downstream, and then the flow complexifies. I want to duplicate the project, so I choose the option:
Uploaded files Duplicate data of uploaded datasets and managed folders.
I assumed that my csv files would be included since I consider them as "Uploaded files." However, they were not uploaded, and thus, I do not understand the option above. I also assumed that the schemas of the various datasets would be uploaded without the data, and that is fine.
The second advanced option:
Required inputsd Duplicate data of required (uploaded and input) datasets and managed folders.
worked as I expected. The csv files in the input folder were duplicated, as were the datasets connected to these input files. But non of the other 15 datasets were duplicated since I can recreate them from the original dataset.
So I would like to understand better the first option, which would the least memory costly. Thanks.