Refer to a managed folder as a folder instead of individual files

Options
maarten98
maarten98 Registered Posts: 1 ✭✭✭

Hi all,

I'm running into a problem while setting up a BERT model script in a text classification task/flow. The Huggingface transformers take a path of a folder containing multiple files as input. This works fine when testing locally, but the architecture of Dataiku forces me to use managed folder in which the language model (BERT transformer) files reside and I see no easy way of giving the managed folder as input. Is there a common solution for this? Below is a solution I've tried:

from transformers import BertTokenizer, BertModelimport dataiku#Path to the language model managed folderLM_FOLDER_NAME = 'LM'LM_FOLDER = dataiku.Folder(LM_FOLDER_NAME)LM_PATH = LM_FOLDER.get_path()tokenizer = BertTokenizer.from_pretrained(LM_PATH + "/output_large")bert_model = BertModel.from_pretrained(LM_PATH + "/output_large")

However, the code above gives the following error:

---------------------------------------------------------------------------PermissionError                           Traceback (most recent call last)<ipython-input-4-a97560aab45e> in <module>----> 4 tokenizer = BertTokenizer.from_pretrained(LM_PATH + "/output_large")      5 bert_model = BertModel.from_pretrained(LM_PATH + "/output_large")/opt/dataiku/code-env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)   1668                         local_files_only=local_files_only,   1669                         use_auth_token=use_auth_token,-> 1670                         user_agent=user_agent,   1671                     )   1672/opt/dataiku/code-env/lib/python3.6/site-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)   1171             user_agent=user_agent,   1172             use_auth_token=use_auth_token,-> 1173             local_files_only=local_files_only,   1174         )   1175     elif os.path.exists(url_or_filename):/opt/dataiku/code-env/lib/python3.6/site-packages/transformers/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)   1318         cache_dir = str(cache_dir)   1319-> 1320     os.makedirs(cache_dir, exist_ok=True)   1321   1322     headers = {"user-agent": http_user_agent(user_agent)}/usr/lib64/python3.6/os.py in makedirs(name, mode, exist_ok)    208     if head and tail and not path.exists(head):    209         try:--> 210             makedirs(head, mode, exist_ok)    211         except FileExistsError:    212             # Defeats race condition when another thread created the path/usr/lib64/python3.6/os.py in makedirs(name, mode, exist_ok)    208     if head and tail and not path.exists(head):    209         try:--> 210             makedirs(head, mode, exist_ok)    211         except FileExistsError:    212             # Defeats race condition when another thread created the path/usr/lib64/python3.6/os.py in makedirs(name, mode, exist_ok)    208     if head and tail and not path.exists(head):    209         try:--> 210             makedirs(head, mode, exist_ok)    211         except FileExistsError:    212             # Defeats race condition when another thread created the path/usr/lib64/python3.6/os.py in makedirs(name, mode, exist_ok)    218             return    219     try:--> 220         mkdir(name, mode)    221     except OSError:    222         # Cannot rely on checking for EEXIST, since the operating systemPermissionError: [Errno 13] Permission denied: '/home/dssuser'

As get_download_stream() works for files specifically this is of little help as I need the entire folder -> 'output_large'. Any help would be welcome!

Thanks!

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    the /home/dssuser isn't accessible to the UNIX users running recipes or notebooks in DSS, so you need to grant at least traversal up to the DSS datadir, starting with a `chmod 755 /home/dssuser`

Setup Info
    Tags
      Help me…