Plugin Tesseract
Hello! I have a problem in running the plugin Tesseract with the recipe text extraction.
Even though, pytesseract 0.3.7 is installed in the managed environment,I have an error that it's not installed or not found in path.
Answers
-
Hi,
In addition to the python package pytesseract, the Tesseract system package must be installed on the machine that runs Dataiku (it's written in the How to setup section of the plugin webpage: https://www.dataiku.com/product/plugins/tesseract-ocr/).The python package is just a wrapper to call the Tesseract system package that cannot be installed by Dataiku.
You can check that Tesseract has been installed by typing the tesseract command in your terminal.
-
Hi,
Thank you for your response.Do you know how to get to the terminal of Dataiku to install tesseract on the machine?
-
Sorry but you need to have admin access to the machine on which Dataiku is installed and install tesseract yourself.
This cannot be done directly from Dataiku. -
Hello,
Thank you for your answer.I'am in the group on administrators in Dataiku.But ,I don't know how to get to the terminal of the machine. I don't know if it's sufficient ?
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
@StanG
bumping this to keep info consolidatedwould we benefit from containerized exec with this plugin? if so, im assuming we'll need to install os libraries in base images as well?
-
Hi,
Yes exactly, if you install the tesseract library in the base image as well as building the plugin code env for your container image, then containerized execution should work.
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
thank you!