Hi Dataiku Community, happy New Year 🙂

I am new over here and this is the first time I post a Question, so here it comes:

Would like to know how can I extract data from jpg images I mean what kind of node and or recipe should I utilize in order to solve a task like this.

Basically I got multiple screenshots and now I need to extract the data from this images and then apply a validation process against an undelying xls file. 

I will appreciate your kind commentaries and suggestions.

Operating system used: Windows

Hey @anthonyfergon28, happy new year!


You can use a Python recipe on the design node.

For example your script may look something like the below:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
images_for_retraining = dataiku.Folder("YOUR_FOLDER_NAME")
images_for_retraining_info = images_for_retraining.get_info()

paths = images_for_retraining.list_paths_in_partition()

# Display a single image
from IPython.display import Image

### Do your processing here ###

# Write output dataset, if you create a dataframe it may be e.g.
output_ds = dataiku.Dataset("YOUR_OUTPUT_DATASET")


Thank you Muennighoff I appreciate your kind response. Sincere apologies for the delay response. I am going to test it today. Kind regards.

