Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Plugin Tesseract

safa94
Level 1
Plugin Tesseract

Hello! I have a problem in running the plugin Tesseract with the recipe text extraction.

Even though, pytesseract 0.3.7 is installed in the managed environment,I have an error that it's not installed or not found in path.

0 Kudos
7 Replies
StanG
Dataiker
Dataiker

Hi,
In addition to the python package pytesseract, the Tesseract system package must be installed on the machine that runs Dataiku (it's written in the How to setup section of the plugin webpage: https://www.dataiku.com/product/plugins/tesseract-ocr/).

The python package is just a wrapper to call the Tesseract system package that cannot be installed by Dataiku.

You can check that Tesseract has been installed by typing the tesseract command in your terminal.

0 Kudos
safa94
Level 1
Author

Hi,

Thank you for your response.Do you know how to get to the terminal of Dataiku to install tesseract on the machine? 

0 Kudos
StanG
Dataiker
Dataiker

Sorry but you need to have admin access to the machine on which Dataiku is installed and install tesseract yourself.
This cannot be done directly from Dataiku.

0 Kudos
safa94
Level 1
Author

Hello,

Thank you for your answer.I'am in the group on administrators in Dataiku.But ,I don't know how to get to the terminal of the machine. I don't know if it's sufficient  ?

0 Kudos
importthepandas
Level 4

@StanG bumping this to keep info consolidated 🙂

would we benefit from containerized exec with this plugin? if so, im assuming we'll need to install os libraries in base images as well? 

0 Kudos
StanG
Dataiker
Dataiker

Hi,

Yes exactly, if you install the tesseract library in the base image as well as building the plugin code env for your container image, then containerized execution should work.

0 Kudos

thank you!

0 Kudos