Multiple S3 file read

sj0071992
Multiple S3 file read

Hi Team,

 

How can i read multiple S3 files from a single connection by identifying the path

 

Thanks in Advance

0 Kudos
9 Replies
fchataigner2
Dataiker

Hi,

the default for S3 datasets in DSS is to point to a S3 "folder" (ie a prefix for blob object paths), and DSS will consider all the blobs in that folder to belong to the dataset. If you want to restrict to a few paths, you can use "show advanced options" in the dataset Settings > Connection tab, and give rules to get fine-grained control over which blobs constitute the dataset.

0 Kudos
sj0071992
Author

Hi,

 

I selected the S3 connection and gave the path where all my files are stored but i am not able to read all the files , even i am not able to list the files under that path.

Below is the error:

Did not find any non-empty file

0 Kudos
sj0071992
Author

Hi Team,

 

Could you please help here.

0 Kudos
fchataigner2
Dataiker

Hi,

the error message implies that there is no blob in your S3 bucket with the prefix you gave. You should browse the path to make sure you fill a value for the path that points to some files.

0 Kudos
sj0071992
Author

Hi,

 

The path i provided is showing me the files when i browse it but when i tried "List Files" it is showing me the error:

Did not find any non-empty file

do we have to change any setting in connection?

0 Kudos
fchataigner2
Dataiker

if the path points to a place with files, then it's the inclusion/exclusion rules which are incorrect. Can you screenshot the state of the setup in that screen, and an example of full blob path you want selected?

0 Kudos
sj0071992
Author

Hi, 

 

Below you can see i can able to browse objects but not able to read or list files, even there is no rule in the advance section

 

S_S_1.pngS_S_2.png

0 Kudos
fchataigner2
Dataiker

that behavior is indeed unexpected. If you have files when browsing, hitting "list files" would show them.

You should generate an instance diagnostic in Administration > Maintenance > Diagnostic tool, and open a ticket on support.dataiku.com with the zip (sent over dl.dataiku.com if too big, over 15MB)

0 Kudos
sj0071992
Author

Hi,

 

I am now able to create a managed folder and able to see all my files.

But how can i now read all files in my folder using python or other recipe

 

Please note:

1. I have Sub folders in my Managed folder

2. File format is gz

 

Thanks in Advance

0 Kudos