Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

Saving downloaded model

NR
Level 2
Saving downloaded model

Hi,

I'm trying to use transformers in a python reciepe.

I need to define a cache folder where to save downloaded model to avoid downloading each time.

How to define cache_dir ? Is it user resouces folder and how to access it from code ? 

Here is a sample code :

from transformers import AutoModelForSequenceClassification

model_name = "bert-base-uncased"
cache_dir = "/path/to/cache/dir"

model = AutoModelForSequenceClassification.from_pretrained(model_name, cache_dir=cache_dir)

 Thanks

0 Kudos
2 Replies
MiguelangelC
Dataiker

Hi,

The recommended way to set a cache dir for Hugging face transformers is to use a resource initialisation script on the code environment being used.

Go to Administration > Code Envs > <Select code env used on your recipe> >Resources.

Here, there is a code sample for Hugging Face that you can use (the exact code depends on the DSS version)

 

## Base imports
from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var

# Clears all environment variables defined by previously run script
clear_all_env_vars()

## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")

# Import Hugging Face's transformers
import transformers

# Download pretrained model: automatically managed by Hugging Face,
# does not download anything if model is already in HF_HOME
model = transformers.DistilBertModel.from_pretrained("distilbert-base-uncased")

 

 

The "distilbert-base-uncased" model is downloaded by default as an example. You can add at this location the models you want to cache. They'll be saved to DATA_DIR/code-envs/resources/python/<code-env name>/huggingface/hub

 

 

 

0 Kudos
NR
Level 2
Author

Thank for your answer. I'm don't have admin access.  Is it mandatory to have it.

I can see on my profile a use ressources tab. Is it possible to use this folder as a cache ?

Thanks for your assistance.profile.png 

0 Kudos