Extract data from jpg images

anthonyfergon28 Registered Posts: 11

Hi Dataiku Community, happy New Year

I am new over here and this is the first time I post a Question, so here it comes:

Would like to know how can I extract data from jpg images I mean what kind of node and or recipe should I utilize in order to solve a task like this.

Basically I got multiple screenshots and now I need to extract the data from this images and then apply a validation process against an undelying xls file.

I will appreciate your kind commentaries and suggestions.

Operating system used: Windows



  • Muennighoff
    Muennighoff Dataiker, Registered Posts: 3 Dataiker
    edited 4:52PM

    Hey @anthonyfergon28
    , happy new year!

    You can use a Python recipe on the design node.

    For example your script may look something like the below:

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    # Read recipe inputs
    images_for_retraining = dataiku.Folder("YOUR_FOLDER_NAME")
    images_for_retraining_info = images_for_retraining.get_info()
    paths = images_for_retraining.list_paths_in_partition()
    # Display a single image
    from IPython.display import Image
    ### Do your processing here ###
    # Write output dataset, if you create a dataframe it may be e.g.
    output_ds = dataiku.Dataset("YOUR_OUTPUT_DATASET")

  • anthonyfergon28
    anthonyfergon28 Registered Posts: 11

    Thank you Muennighoff I appreciate your kind response. Sincere apologies for the delay response. I am going to test it today. Kind regards.

Setup Info
      Help me…