Exporting multiple datasets into dataiku

Solved!
SeherFazlioglu
Level 1
Exporting multiple datasets into dataiku

Dear all,

The task is to export the multiple datasets into Dataiku. The original data format includes three different dataframe (20 each) within a folder. I need to have all separately, 60 files. How can i export such data? How can partitioning facility help me? 

Thanks

Seher

 

1 Solution
anino
Dataiker

Hello Seher,

If I understand correctly, you have 20 different files in 3 different folders stored outside of DSS. If you would like to have these 60 files in one location you can use a managed folder to upload these files to DSS, after this you can either create one dataset for multiple files or create one dataset for each file. In the following link you can find a tutorial on how to do this:

https://knowledge.dataiku.com/latest/courses/folders/managed-folders-hands-on.html#create-a-files-in...

Another option is creating a file-based dataset, in this case, you should activate partitioning + define a dimension identifier that matches your folders structure:

https://doc.dataiku.com/dss/latest/partitions/fs_datasets.html#partitioning-files-based-datasets

I've uploaded 2 examples of file-based datasets using an S3 folder:

  • time_partition.png: Includes 3 files in 3 different folders and it's partitioned using a time dimension identifier
  • dimension_partition.png: Includes multiple files in 3 different folders using a discrete dimension identifier

https://doc.dataiku.com/dss/latest/partitions/identifiers.html#partition-identifiers

BR,

รlvaro

Technical Support Engineer

View solution in original post

2 Replies
anino
Dataiker

Hello Seher,

If I understand correctly, you have 20 different files in 3 different folders stored outside of DSS. If you would like to have these 60 files in one location you can use a managed folder to upload these files to DSS, after this you can either create one dataset for multiple files or create one dataset for each file. In the following link you can find a tutorial on how to do this:

https://knowledge.dataiku.com/latest/courses/folders/managed-folders-hands-on.html#create-a-files-in...

Another option is creating a file-based dataset, in this case, you should activate partitioning + define a dimension identifier that matches your folders structure:

https://doc.dataiku.com/dss/latest/partitions/fs_datasets.html#partitioning-files-based-datasets

I've uploaded 2 examples of file-based datasets using an S3 folder:

  • time_partition.png: Includes 3 files in 3 different folders and it's partitioned using a time dimension identifier
  • dimension_partition.png: Includes multiple files in 3 different folders using a discrete dimension identifier

https://doc.dataiku.com/dss/latest/partitions/identifiers.html#partition-identifiers

BR,

รlvaro

Technical Support Engineer
SeherFazlioglu
Level 1
Author

Hi Alvaro,

Thanks a lot for the elaborated answer. The options that have provided will be very useful indeed.

Kind regards

Seher

0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku