Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata

nmishra5 · February 2023

I am trying to use Python notebook template "image processing for text extraction" for my custom requirement. I followed steps mentioned in Plugin documentation of Tesseract-OCR and from notebook I also set plugin code env and before running my code I just restarted kernal to make sure everything work properly as a normal practice. I get following TesseractError :

(1, 'Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

Request if someone could help here.

Zach · February 2023

Hi @nmishra5
,

This error indicates that Tesseract wasn't able to find the data file for English.

Could you please verify if the file "/usr/share/tesseract/4/tessdata/eng.traineddata" exists?

If the file doesn't exist, you'll need to install it. For more information, see the "Specific languages" section in the README: https://github.com/dataiku/dss-plugin-tesseract-ocr/tree/v1.0.2#specific-languages

Thanks,

Zach

nmishra5 · February 2023

Hi ZachM,

Thanks for your reply.

I checked and "/usr/share/tesseract/4/tessdata/eng.traineddata" file is missing currently.

Could you please help me with the installation line code for the same?

Nirbhay

Zach · February 2023

If you're using a RHEL-based distro, such as CentOS or AlmaLinux, you can install it using the following command:

yum install tesseract-langpack-eng

If you're using a Debian-based distro, such as Ubuntu, you can install it using the following command:

apt install tesseract-ocr-eng

If you're using a different distro or are unsure, could you please let me know what distro (including the version) that you're using? For example, CentOS 7.

Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata

Answers

Categories

Setup Info

Tags