Managed Folder of JPGs from S3 Bucket
Hello, I have an S3 bucket full of .jpg files. I am trying to get these into Dataiku as a managed folder. I am able to import other file types from the S3 bucket to Dataiku but it is not working for the .jpg extensions. Dataiku fails to recognize the file type when I go to add the folder of jpgs and there is not a corresponding option for Type under Format/Preview.
Has anyone run into this or have a solution?
The main goal is to use the deep learning for image classification plugin, but I am hoping I do not have to download all the images and then upload them directly to a managed folder.
Thanks in advance!
Best Answer
-
Hello ryanraasch,
JPEG image files aren't tabular data so you won't be able to use the "format/preview" feature, in fact you don't need it for your deep learning use-case. You can find an example of how the plugin can be used here: https://gallery.dataiku.com/projects/LIONANDTIGER/flow/
In your case, the only thing you need to do when setting up your managed folder is to adjust its settings and point it to the location of your training images in your S3 bucket.
Hope this helps.
Best,
Harizo
Answers
-
Hi @HarizoR
, thanks for your reply. I adjusted the settings and pointed to the folder in the S3 bucket containing the training images. Before I can save or do anything with dataset, Dataiku is requiring me to select a format for the data. It is saying " No format configured, dataset won't be usable; No schema set, dataset won't be usable." Are there certain settings to use to bypass this? Do the images need to be on the filesystem? I see in the plugin example the images are stored on the filesystem.Any other help you can provide would be greatly appreciated. Thanks!
-
Hi,
You need to use a Managed Folder, not a Dataset: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html
Best,
Harizo