Extract data from jpg images
Hi Dataiku Community, happy New Year
I am new over here and this is the first time I post a Question, so here it comes:
Would like to know how can I extract data from jpg images I mean what kind of node and or recipe should I utilize in order to solve a task like this.
Basically I got multiple screenshots and now I need to extract the data from this images and then apply a validation process against an undelying xls file.
I will appreciate your kind commentaries and suggestions.
Operating system used: Windows
Answers
-
Hey @anthonyfergon28
, happy new year!You can use a Python recipe on the design node.
For example your script may look something like the below:
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs images_for_retraining = dataiku.Folder("YOUR_FOLDER_NAME") images_for_retraining_info = images_for_retraining.get_info() paths = images_for_retraining.list_paths_in_partition() # Display a single image from IPython.display import Image Image(filename=images_for_retraining.file_path(paths[0])) ### Do your processing here ### # Write output dataset, if you create a dataframe it may be e.g. output_ds = dataiku.Dataset("YOUR_OUTPUT_DATASET") output_ds.write_with_schema(your_dataframe)
-
Thank you Muennighoff I appreciate your kind response. Sincere apologies for the delay response. I am going to test it today. Kind regards.