Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on October 26, 2022 7:27AM
Likes: 1
Replies: 2
Dear all,
The task is to export the multiple datasets into Dataiku. The original data format includes three different dataframe (20 each) within a folder. I need to have all separately, 60 files. How can i export such data? How can partitioning facility help me?
Thanks
Seher
Hello Seher,
If I understand correctly, you have 20 different files in 3 different folders stored outside of DSS. If you would like to have these 60 files in one location you can use a managed folder to upload these files to DSS, after this you can either create one dataset for multiple files or create one dataset for each file. In the following link you can find a tutorial on how to do this:
https://knowledge.dataiku.com/latest/courses/folders/managed-folders-hands-on.html#create-a-files-in-folder-dataset
Another option is creating a file-based dataset, in this case, you should activate partitioning + define a dimension identifier that matches your folders structure:
https://doc.dataiku.com/dss/latest/partitions/fs_datasets.html#partitioning-files-based-datasets
I've uploaded 2 examples of file-based datasets using an S3 folder:
https://doc.dataiku.com/dss/latest/partitions/identifiers.html#partition-identifiers
BR,
Álvaro
Hi Alvaro,
Thanks a lot for the elaborated answer. The options that have provided will be very useful indeed.
Kind regards
Seher