Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata
 
            I am trying to use Python notebook template "image processing for text extraction" for my custom requirement. I followed steps mentioned in Plugin documentation of Tesseract-OCR and from notebook I also set plugin code env and before running my code I just restarted kernal to make sure everything work properly as a normal practice. I get following TesseractError :
(1, 'Error opening data file /usr/share/tesseract/4/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
Request if someone could help here.
Answers
- 
            Hi @nmishra5 
 ,This error indicates that Tesseract wasn't able to find the data file for English. Could you please verify if the file "/usr/share/tesseract/4/tessdata/eng.traineddata" exists? If the file doesn't exist, you'll need to install it. For more information, see the "Specific languages" section in the README: https://github.com/dataiku/dss-plugin-tesseract-ocr/tree/v1.0.2#specific-languages Thanks, Zach 
- 
             nmishra5 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 Partner nmishra5 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 PartnerHi ZachM, Thanks for your reply. I checked and "/usr/share/tesseract/4/tessdata/eng.traineddata" file is missing currently. Could you please help me with the installation line code for the same? Nirbhay 
- 
            If you're using a RHEL-based distro, such as CentOS or AlmaLinux, you can install it using the following command: yum install tesseract-langpack-eng If you're using a Debian-based distro, such as Ubuntu, you can install it using the following command: apt install tesseract-ocr-eng If you're using a different distro or are unsure, could you please let me know what distro (including the version) that you're using? For example, CentOS 7. 
