Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Title: Importing Python Environment with Hugging Face Model Without Downloading

Piteur
Level 1
Title: Importing Python Environment with Hugging Face Model Without Downloading

Hello everyone,

I am facing an issue with importing a Python environment on Dataiku.

I have two Dataiku instances: one connected to the network and the other offline.

  1. On the network-connected Dataiku environment, I created my Python environment with the desired packages and imported a Hugging Face model through the "Resource" tab.
  2. I then exported this Python environment to use it on the offline environment.

However, when I transfer the Python environment from the connected system to the offline system, the Hugging Face model tries to reinstall itself, which fails since the offline environment cannot access the internet.

I noticed there is an "Upload" button, but I am unsure which ZIP file to upload to resolve this issue.

My question is: How can I import the Python environment with the Hugging Face model into the offline environment without requiring a new download?

Thank you in advance for your help!

Best regards,

Piteur


Operating system used: windows

3 Replies
Grixis
Level 4

Hello,

I'm not sure I understand at what level you are unable to import your pre-trained models. 

If this is the model in your code environment you can do this from a script loading it into the resources directory like this: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/inde...

Or if its in your code when you build, its not going to your loaded pretrained model folder. In this case, may need the HF_HOME environment variable which points to the folder where the models are stored. Then just call from_pretrained with the argument local_files_only = TRUE.

If you are just stuck importing the right pre-trained model with its good tree structure for your offline, perhaps you can use the sample code scripts to import the model resources in zip to your local desktop and then just drop it into your env code with the upload function.

 

0 Kudos

The link posted by @Grixis is broken, here is the correct one: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/inde...

He is right to point out that you need to download the HF models and store them somewhere in your offline system. The Python code environment has nothing to do with that. While HF uses Python to download the models that doesn't mean the models are stored with the code environment, they are not. 

MehdiH
Dataiker

Hello Piteur,

What you could do is zip the content of the code environment resources folder in your "online" system, and then upload it to your "offline" system (using the upload button).

You would still need to set relevant environment variables in the resources initialization script in the "offline" system, but would skip the lines that perform the download of the pretrain model.

In your "online" system, the code-env resources folder is located in

<dss-home>/code-envs/resources/python/<code-env-name>/

Best,

0 Kudos