Exporting multiple datasets into dataiku
Dear all,
The task is to export the multiple datasets into Dataiku. The original data format includes three different dataframe (20 each) within a folder. I need to have all separately, 60 files. How can i export such data? How can partitioning facility help me?
Thanks
Seher
Best Answer
-
Álvaro Andrés Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Dataiker
Hello Seher,
If I understand correctly, you have 20 different files in 3 different folders stored outside of DSS. If you would like to have these 60 files in one location you can use a managed folder to upload these files to DSS, after this you can either create one dataset for multiple files or create one dataset for each file. In the following link you can find a tutorial on how to do this:
https://knowledge.dataiku.com/latest/courses/folders/managed-folders-hands-on.html#create-a-files-in-folder-dataset
Another option is creating a file-based dataset, in this case, you should activate partitioning + define a dimension identifier that matches your folders structure:
https://doc.dataiku.com/dss/latest/partitions/fs_datasets.html#partitioning-files-based-datasets
I've uploaded 2 examples of file-based datasets using an S3 folder:
- time_partition.png: Includes 3 files in 3 different folders and it's partitioned using a time dimension identifier
- dimension_partition.png: Includes multiple files in 3 different folders using a discrete dimension identifier
https://doc.dataiku.com/dss/latest/partitions/identifiers.html#partition-identifiers
BR,
Álvaro
Answers
-
Hi Alvaro,
Thanks a lot for the elaborated answer. The options that have provided will be very useful indeed.
Kind regards
Seher