using tesseract to read pdf
mhussain79
Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭
Hi all,
I am a python script that uses tesseract engine in order to extract text from scanned pdf files. I have already tried to use tesseract OCR plugin but the results aren't what I am looking for. The python script that I wrote in my laptop is working fine. However, When I am using the same code in dataiku server I got this error.
both python script and dataiku notebook error are attached here.
please let me know how to fix this issue
Thanks
Answers
-
The error message is quite clear. You need to install Tesseract version 3.05 or newer in the DSS server so that the pytesseract library can work properly. There are more detail's in the library documentation:
-
shivangisingh88 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭
Hi mhussain79,
Could you please provide the python code for it ?