Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata

nmishra5 · ‎02-22-2023

I am trying to use Python notebook template "image processing for text extraction" for my custom requirement. I followed steps mentioned in Plugin documentation of Tesseract-OCR and from notebook I also set plugin code env and before running my code I just restarted kernal to make sure everything work properly as a normal practice. I get following TesseractError :

(1, 'Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

Request if someone could help here.

ZachM · ‎02-22-2023

Hi @nmishra5,

This error indicates that Tesseract wasn't able to find the data file for English.

Could you please verify if the file "/usr/share/tesseract/4/tessdata/eng.traineddata" exists?

If the file doesn't exist, you'll need to install it. For more information, see the "Specific languages" section in the README: https://github.com/dataiku/dss-plugin-tesseract-ocr/tree/v1.0.2#specific-languages

Thanks,

Zach

nmishra5 · ‎02-23-2023

Hi ZachM,

Thanks for your reply.

I checked and "/usr/share/tesseract/4/tessdata/eng.traineddata" file is missing currently.

Could you please help me with the installation line code for the same?

Nirbhay

ZachM · ‎02-25-2023

If you're using a RHEL-based distro, such as CentOS or AlmaLinux, you can install it using the following command:

yum install tesseract-langpack-eng

If you're using a Debian-based distro, such as Ubuntu, you can install it using the following command:

apt install tesseract-ocr-eng

If you're using a different distro or are unsure, could you please let me know what distro (including the version) that you're using? For example, CentOS 7.

Sign up to take part

Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata

Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata

Setup info