Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Managed Folder of JPGs from S3 Bucket

ryanraasch
Level 1
Managed Folder of JPGs from S3 Bucket

Hello, I have an S3 bucket full of .jpg files. I am trying to get these into Dataiku as a managed folder. I am able to import other file types from the S3 bucket to Dataiku but it is not working for the .jpg extensions. Dataiku fails to recognize the file type when I go to add the folder of jpgs and there is not a corresponding option for Type under Format/Preview. 

Has anyone run into this or have a solution?

The main goal is to use the deep learning for image classification plugin, but I am hoping I do not have to download all the images and then upload them directly to a managed folder.

Thanks in advance!

0 Kudos
3 Replies
HarizoR
Dataiker
Dataiker

Hello ryanraasch,

 

JPEG image files aren't tabular data so you won't be able to use the "format/preview" feature, in fact you don't need it for your deep learning use-case. You can find an example of how the plugin can be used here: https://gallery.dataiku.com/projects/LIONANDTIGER/flow/

In your case, the only thing you need to do when setting up your managed folder is to adjust its settings and point it to the location of your training images in your S3 bucket.

Hope this helps.

Best,

 

Harizo

0 Kudos
ryanraasch
Level 1
Author

Hi @HarizoR , thanks for your reply. I adjusted the settings and pointed to the folder in the S3 bucket containing the training images. Before I can save or do anything with dataset, Dataiku is requiring me to select a format for the data. It is saying " No format configured, dataset won't be usable; No schema set, dataset won't be usable." Are there certain settings to use to bypass this? Do the images need to be on the filesystem? I see in the plugin example the images are stored on the filesystem.

Any other help you can provide would be greatly appreciated. Thanks!

0 Kudos
HarizoR
Dataiker
Dataiker

Hi,

You need to use a Managed Folder, not a Dataset: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html

Best,

Harizo

0 Kudos