Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I'm trying to use transformers in a python reciepe.
I need to define a cache folder where to save downloaded model to avoid downloading each time.
How to define cache_dir ? Is it user resouces folder and how to access it from code ?
Here is a sample code :
from transformers import AutoModelForSequenceClassification
model_name = "bert-base-uncased"
cache_dir = "/path/to/cache/dir"
model = AutoModelForSequenceClassification.from_pretrained(model_name, cache_dir=cache_dir)
Thanks
Hi,
The recommended way to set a cache dir for Hugging face transformers is to use a resource initialisation script on the code environment being used.
Go to Administration > Code Envs > <Select code env used on your recipe> >Resources.
Here, there is a code sample for Hugging Face that you can use (the exact code depends on the DSS version)
## Base imports
from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var
# Clears all environment variables defined by previously run script
clear_all_env_vars()
## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")
# Import Hugging Face's transformers
import transformers
# Download pretrained model: automatically managed by Hugging Face,
# does not download anything if model is already in HF_HOME
model = transformers.DistilBertModel.from_pretrained("distilbert-base-uncased")
The "distilbert-base-uncased" model is downloaded by default as an example. You can add at this location the models you want to cache. They'll be saved to DATA_DIR/code-envs/resources/python/<code-env name>/huggingface/hub
Thank for your answer. I'm don't have admin access. Is it mandatory to have it.
I can see on my profile a use ressources tab. Is it possible to use this folder as a cache ?
Thanks for your assistance.