Multiple S3 file read
Hi Team,
How can i read multiple S3 files from a single connection by identifying the path
Thanks in Advance
Answers
-
Hi,
the default for S3 datasets in DSS is to point to a S3 "folder" (ie a prefix for blob object paths), and DSS will consider all the blobs in that folder to belong to the dataset. If you want to restrict to a few paths, you can use "show advanced options" in the dataset Settings > Connection tab, and give rules to get fine-grained control over which blobs constitute the dataset.
-
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
Hi,
I selected the S3 connection and gave the path where all my files are stored but i am not able to read all the files , even i am not able to list the files under that path.
Below is the error:
Did not find any non-empty file
-
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
Hi Team,
Could you please help here.
-
Hi,
the error message implies that there is no blob in your S3 bucket with the prefix you gave. You should browse the path to make sure you fill a value for the path that points to some files.
-
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
Hi,
The path i provided is showing me the files when i browse it but when i tried "List Files" it is showing me the error:
Did not find any non-empty file
do we have to change any setting in connection?
-
if the path points to a place with files, then it's the inclusion/exclusion rules which are incorrect. Can you screenshot the state of the setup in that screen, and an example of full blob path you want selected?
-
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
Hi,
Below you can see i can able to browse objects but not able to read or list files, even there is no rule in the advance section
-
that behavior is indeed unexpected. If you have files when browsing, hitting "list files" would show them.
You should generate an instance diagnostic in Administration > Maintenance > Diagnostic tool, and open a ticket on support.dataiku.com with the zip (sent over dl.dataiku.com if too big, over 15MB)
-
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
Hi,
I am now able to create a managed folder and able to see all my files.
But how can i now read all files in my folder using python or other recipe
Please note:
1. I have Sub folders in my Managed folder
2. File format is gz
Thanks in Advance