PermissionError while trying to run a python recipe

Ali_Nik
Level 1
PermissionError while trying to run a python recipe

Hello,

I am trying to use a library called DocTR in dataiku. DocTR is, under the hood, running Deep Learning models to perform the different document text extraction. When I use the library in a jupyter notebook in dataiku, the pretrained resnet50 and vgg16 models are downloaded in the cache of the jupyter notebook and everything works fine.

jupyter.PNG

 

But when I try to run the same code within a python script in dataiku, I get the PermissionError because my user, assumably does not have permission to download the pretrained models in the cache of the dataiku instance. 

error_py_script_updated.jpg

Is there any way I can get around this problem other than storing the pretrained models in another location and providing their paths to the DocTR?

0 Kudos
1 Reply
DanDy
Dataiker

Hi,

You should be able to workaround the error by exporting the DOCTR_CACHE_DIR , DOCTR_MULTIPROCESSING_DISABLE environment variables to a directory path, such as a directory outside of the DSS data dir and not in another user’s home_dir, that is readable and writable (i.e. chmod 777 permission) by all users. Ref. https://mindee.github.io/doctr/using_doctr/running_on_aws.html

For example, add the following to the Linux user profile of the dssuser (e.g. "~/.bash_profile",  or "~/.bashrc") or to the "<DATA_DIR>/bin/env-site.sh" file, then restart DSS:

export DOCTR_MULTIPROCESSING_DISABLE=TRUE
export DOCTR_CACHE_DIR=/tmp

## Base imports
from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var# Clears all environment variables defined by previously run script
clear_all_env_vars()## DocTR# Set DocTR cache directory
set_env_path("DOCTR_CACHE_DIR", "DOCTR_CACHE_DIR")
set_env_var("DOCTR_MULTIPROCESSING_DISABLE", "TRUE")# Import DocTR
from doctr.models import ocr_predictor

.................

 

 

Labels

?
Labels (3)
A banner prompting to get Dataiku