using tesseract to read pdf

mhussain79 · April 2023

Hi all,

I am a python script that uses tesseract engine in order to extract text from scanned pdf files. I have already tried to use tesseract OCR plugin but the results aren't what I am looking for. The python script that I wrote in my laptop is working fine. However, When I am using the same code in dataiku server I got this error.

both python script and dataiku notebook error are attached here.

please let me know how to fix this issue

Thanks

JuanE · April 2023

The error message is quite clear. You need to install Tesseract version 3.05 or newer in the DSS server so that the pytesseract library can work properly. There are more detail's in the library documentation:

https://github.com/madmaze/pytesseract#installation

shivangisingh88 · October 2023

Hi mhussain79,
Could you please provide the python code for it ?

using tesseract to read pdf

Answers

Categories

Setup Info

Tags