Managed Folder of JPGs from S3 Bucket

ryanraasch
ryanraasch PartnerApplicant, Registered Posts: 10 ✭✭✭

Hello, I have an S3 bucket full of .jpg files. I am trying to get these into Dataiku as a managed folder. I am able to import other file types from the S3 bucket to Dataiku but it is not working for the .jpg extensions. Dataiku fails to recognize the file type when I go to add the folder of jpgs and there is not a corresponding option for Type under Format/Preview.

Has anyone run into this or have a solution?

The main goal is to use the deep learning for image classification plugin, but I am hoping I do not have to download all the images and then upload them directly to a managed folder.

Thanks in advance!

Best Answer

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    Answer ✓

    Hello ryanraasch,

    JPEG image files aren't tabular data so you won't be able to use the "format/preview" feature, in fact you don't need it for your deep learning use-case. You can find an example of how the plugin can be used here: https://gallery.dataiku.com/projects/LIONANDTIGER/flow/

    In your case, the only thing you need to do when setting up your managed folder is to adjust its settings and point it to the location of your training images in your S3 bucket.

    Hope this helps.

    Best,

    Harizo

Answers

  • ryanraasch
    ryanraasch PartnerApplicant, Registered Posts: 10 ✭✭✭

    Hi @HarizoR
    , thanks for your reply. I adjusted the settings and pointed to the folder in the S3 bucket containing the training images. Before I can save or do anything with dataset, Dataiku is requiring me to select a format for the data. It is saying " No format configured, dataset won't be usable; No schema set, dataset won't be usable." Are there certain settings to use to bypass this? Do the images need to be on the filesystem? I see in the plugin example the images are stored on the filesystem.

    Any other help you can provide would be greatly appreciated. Thanks!

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker

    Hi,

    You need to use a Managed Folder, not a Dataset: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html

    Best,

    Harizo

Setup Info
    Tags
      Help me…