Extract data from jpg images

anthonyfergon28
Level 2
Extract data from jpg images

Hi Dataiku Community, happy New Year ๐Ÿ™‚

I am new over here and this is the first time I post a Question, so here it comes:

Would like to know how can I extract data from jpg images I mean what kind of node and or recipe should I utilize in order to solve a task like this.

Basically I got multiple screenshots and now I need to extract the data from this images and then apply a validation process against an undelying xls file. 

I will appreciate your kind commentaries and suggestions.


Operating system used: Windows

0 Kudos
2 Replies
Muennighoff
Dataiker

Hey @anthonyfergon28, happy new year!

 

You can use a Python recipe on the design node.

For example your script may look something like the below:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
images_for_retraining = dataiku.Folder("YOUR_FOLDER_NAME")
images_for_retraining_info = images_for_retraining.get_info()

paths = images_for_retraining.list_paths_in_partition()

# Display a single image
from IPython.display import Image
Image(filename=images_for_retraining.file_path(paths[0]))

### Do your processing here ###

# Write output dataset, if you create a dataframe it may be e.g.
output_ds = dataiku.Dataset("YOUR_OUTPUT_DATASET")
output_ds.write_with_schema(your_dataframe)

 

anthonyfergon28
Level 2
Author

Thank you Muennighoff I appreciate your kind response. Sincere apologies for the delay response. I am going to test it today. Kind regards.

0 Kudos