PermissionError while trying to run a python recipe

Ali_Nik Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2


I am trying to use a library called DocTR in dataiku. DocTR is, under the hood, running Deep Learning models to perform the different document text extraction. When I use the library in a jupyter notebook in dataiku, the pretrained resnet50 and vgg16 models are downloaded in the cache of the jupyter notebook and everything works fine.


But when I try to run the same code within a python script in dataiku, I get the PermissionError because my user, assumably does not have permission to download the pretrained models in the cache of the dataiku instance.


Is there any way I can get around this problem other than storing the pretrained models in another location and providing their paths to the DocTR?


  • DanDy
    DanDy Dataiker, Dataiku DSS Core Designer, Registered Posts: 8 Dataiker


    You should be able to workaround the error by exporting the DOCTR_CACHE_DIR , DOCTR_MULTIPROCESSING_DISABLE environment variables to a directory path, such as a directory outside of the DSS data dir and not in another user’s home_dir, that is readable and writable (i.e. chmod 777 permission) by all users. Ref.

    For example, add the following to the Linux user profile of the dssuser (e.g. "~/.bash_profile", or "~/.bashrc") or to the "<DATA_DIR>/bin/" file, then restart DSS:


    ## Base imports
    from dataiku.code_env_resources import clear_all_env_vars
    from dataiku.code_env_resources import set_env_path
    from dataiku.code_env_resources import set_env_var# Clears all environment variables defined by previously run script
    clear_all_env_vars()## DocTR# Set DocTR cache directory
    set_env_path("DOCTR_CACHE_DIR", "DOCTR_CACHE_DIR")
    set_env_var("DOCTR_MULTIPROCESSING_DISABLE", "TRUE")# Import DocTR
    from doctr.models import ocr_predictor


Setup Info
      Help me…