Saving downloaded model
Hi,
I'm trying to use transformers in a python reciepe.
I need to define a cache folder where to save downloaded model to avoid downloading each time.
How to define cache_dir ? Is it user resouces folder and how to access it from code ?
Here is a sample code :
from transformers import AutoModelForSequenceClassification
model_name = "bert-base-uncased"
cache_dir = "/path/to/cache/dir"
model = AutoModelForSequenceClassification.from_pretrained(model_name, cache_dir=cache_dir)
Thanks
Answers
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
Hi,
The recommended way to set a cache dir for Hugging face transformers is to use a resource initialisation script on the code environment being used.
Go to Administration > Code Envs > <Select code env used on your recipe> >Resources.
Here, there is a code sample for Hugging Face that you can use (the exact code depends on the DSS version)
## Base imports from dataiku.code_env_resources import clear_all_env_vars from dataiku.code_env_resources import set_env_path from dataiku.code_env_resources import set_env_var # Clears all environment variables defined by previously run script clear_all_env_vars() ## Hugging Face # Set HuggingFace cache directory set_env_path("HF_HOME", "huggingface") # Import Hugging Face's transformers import transformers # Download pretrained model: automatically managed by Hugging Face, # does not download anything if model is already in HF_HOME model = transformers.DistilBertModel.from_pretrained("distilbert-base-uncased")
The "distilbert-base-uncased" model is downloaded by default as an example. You can add at this location the models you want to cache. They'll be saved to DATA_DIR/code-envs/resources/python/<code-env name>/huggingface/hub
-
Thank for your answer. I'm don't have admin access. Is it mandatory to have it.
I can see on my profile a use ressources tab. Is it possible to use this folder as a cache ?
Thanks for your assistance.
-
1 Initial Revenue Date
2 Revenue End Date
3 Daily earnings over the specified time period
I want to determine the total daily revenue at for any day. Is there a plug in or maybe a python code that could enable me to run this "multi time series"?
-
danb Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭
I am trying to do the same but for SentenceTransformers,
from dataiku.code_env_resources import clear_all_env_vars from dataiku.code_env_resources import set_env_path from dataiku.code_env_resources import set_env_var from sentence_transformers import SentenceTransformer # Clears all environment variables defined by previously run script clear_all_env_vars() ## Hugging Face # Set HuggingFace cache directory set_env_path("HF_HOME", "huggingface") # Import Hugging Face's transformers import transformers # Download pretrained model: automatically managed by Hugging Face, # does not download anything if model is already in HF_HOME model = transformers.DistilBertModel.from_pretrained("distilbert-base-uncased") model = transformers.MPNetModel.from_pretrained("microsoft/mpnet-base") model = SentenceTransformer("sentence-transformers/multi-qa-mpnet-base-dot-v1")
but then every time I call the model from my script in the flow, the sentence transformer model is downloaded again and again, while the transformer model is picked up without needing a new download. Is there something that I am missing?
-
You do not need to have administrator rights to use Dataiku resources. However, if you want to install Dataiku on your computer or set it up to access external data sources, you may need administrator rights. Regarding using the resource usage folder as a cache, it is possible but not recommended. The Resource Usage folder is for storing data related to your projects in Dataiku. If you use it as a cache, you may experience problems accessing the data you need while working with Dataiku. To save the downloaded model to Dataiku, you should use the model export function. To do this, select the appropriate model file in the project menu and select "Export Model". You can then save the model in the desired format and import it in another project or application.
-
danb Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭
Hello Rex,
Thank you for your answer - could you be more clear on how to save the model though? Given that the model is not a "recipe" one, but it is a model, loaded in Python because it is fetched from sentence transformers sitting somewhere on Huggingface...