Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on May 23, 2024 8:52AM
Likes: 1
Replies: 3
Hello everyone,
I am facing an issue with importing a Python environment on Dataiku.
I have two Dataiku instances: one connected to the network and the other offline.
However, when I transfer the Python environment from the connected system to the offline system, the Hugging Face model tries to reinstall itself, which fails since the offline environment cannot access the internet.
I noticed there is an "Upload" button, but I am unsure which ZIP file to upload to resolve this issue.
My question is: How can I import the Python environment with the Hugging Face model into the offline environment without requiring a new download?
Thank you in advance for your help!
Best regards,
Piteur
Operating system used: windows
Hello,
I'm not sure I understand at what level you are unable to import your pre-trained models.
If this is the model in your code environment you can do this from a script loading it into the resources directory like this: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html?_gl=1*1n85oae*_ga*MTc0MjQyNjcyOS4xNzE0OTE5NDEy*_ga_B3YXRYMY48*MTcxNjQ5NTQ5Ny4xMi4xLjE3MTY0OTU1MTUuNDIuMC4w
Or if its in your code when you build, its not going to your loaded pretrained model folder. In this case, may need the HF_HOME environment variable which points to the folder where the models are stored. Then just call from_pretrained with the argument local_files_only = TRUE.
If you are just stuck importing the right pre-trained model with its good tree structure for your offline, perhaps you can use the sample code scripts to import the model resources in zip to your local desktop and then just drop it into your env code with the upload function.
The link posted by @Grixis
is broken, here is the correct one: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html
He is right to point out that you need to download the HF models and store them somewhere in your offline system. The Python code environment has nothing to do with that. While HF uses Python to download the models that doesn't mean the models are stored with the code environment, they are not.
Hello Piteur,
What you could do is zip the content of the code environment resources folder in your "online" system, and then upload it to your "offline" system (using the upload button).
You would still need to set relevant environment variables in the resources initialization script in the "offline" system, but would skip the lines that perform the download of the pretrain model.
In your "online" system, the code-env resources folder is located in
<dss-home>/code-envs/resources/python/<code-env-name>/
Best,