Multiple S3 file read

Options
sj0071992
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron

Hi Team,

How can i read multiple S3 files from a single connection by identifying the path

Thanks in Advance

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    the default for S3 datasets in DSS is to point to a S3 "folder" (ie a prefix for blob object paths), and DSS will consider all the blobs in that folder to belong to the dataset. If you want to restrict to a few paths, you can use "show advanced options" in the dataset Settings > Connection tab, and give rules to get fine-grained control over which blobs constitute the dataset.

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi,

    I selected the S3 connection and gave the path where all my files are stored but i am not able to read all the files , even i am not able to list the files under that path.

    Below is the error:

    Did not find any non-empty file

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi Team,

    Could you please help here.

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    the error message implies that there is no blob in your S3 bucket with the prefix you gave. You should browse the path to make sure you fill a value for the path that points to some files.

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi,

    The path i provided is showing me the files when i browse it but when i tried "List Files" it is showing me the error:

    Did not find any non-empty file

    do we have to change any setting in connection?

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    if the path points to a place with files, then it's the inclusion/exclusion rules which are incorrect. Can you screenshot the state of the setup in that screen, and an example of full blob path you want selected?

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi,

    Below you can see i can able to browse objects but not able to read or list files, even there is no rule in the advance section

    S_S_1.pngS_S_2.png

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    that behavior is indeed unexpected. If you have files when browsing, hitting "list files" would show them.

    You should generate an instance diagnostic in Administration > Maintenance > Diagnostic tool, and open a ticket on support.dataiku.com with the zip (sent over dl.dataiku.com if too big, over 15MB)

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi,

    I am now able to create a managed folder and able to see all my files.

    But how can i now read all files in my folder using python or other recipe

    Please note:

    1. I have Sub folders in my Managed folder

    2. File format is gz

    Thanks in Advance

Setup Info
    Tags
      Help me…