PermissionError while trying to run a python recipe

Ali_Nik
Ali_Nik Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2

Hello,

I am trying to use a library called DocTR in dataiku. DocTR is, under the hood, running Deep Learning models to perform the different document text extraction. When I use the library in a jupyter notebook in dataiku, the pretrained resnet50 and vgg16 models are downloaded in the cache of the jupyter notebook and everything works fine.

jupyter.PNG

But when I try to run the same code within a python script in dataiku, I get the PermissionError because my user, assumably does not have permission to download the pretrained models in the cache of the dataiku instance.

error_py_script_updated.jpg

Is there any way I can get around this problem other than storing the pretrained models in another location and providing their paths to the DocTR?

Answers

  • DanDy
    DanDy Dataiker, Dataiku DSS Core Designer, Registered Posts: 8 Dataiker

    Hi,

    You should be able to workaround the error by exporting the DOCTR_CACHE_DIR , DOCTR_MULTIPROCESSING_DISABLE environment variables to a directory path, such as a directory outside of the DSS data dir and not in another user’s home_dir, that is readable and writable (i.e. chmod 777 permission) by all users. Ref. https://mindee.github.io/doctr/using_doctr/running_on_aws.html

    For example, add the following to the Linux user profile of the dssuser (e.g. "~/.bash_profile", or "~/.bashrc") or to the "<DATA_DIR>/bin/env-site.sh" file, then restart DSS:

    export DOCTR_MULTIPROCESSING_DISABLE=TRUEexport DOCTR_CACHE_DIR=/tmp

    ## Base imports
    from dataiku.code_env_resources import clear_all_env_vars
    from dataiku.code_env_resources import set_env_path
    from dataiku.code_env_resources import set_env_var# Clears all environment variables defined by previously run script
    clear_all_env_vars()## DocTR# Set DocTR cache directory
    set_env_path("DOCTR_CACHE_DIR", "DOCTR_CACHE_DIR")
    set_env_var("DOCTR_MULTIPROCESSING_DISABLE", "TRUE")# Import DocTR
    from doctr.models import ocr_predictor

    .................

Setup Info
    Tags
      Help me…