Title: Importing Python Environment with Hugging Face Model Without Downloading

Piteur
Piteur Registered Posts: 1

Hello everyone,

I am facing an issue with importing a Python environment on Dataiku.

I have two Dataiku instances: one connected to the network and the other offline.

  1. On the network-connected Dataiku environment, I created my Python environment with the desired packages and imported a Hugging Face model through the "Resource" tab.
  2. I then exported this Python environment to use it on the offline environment.

However, when I transfer the Python environment from the connected system to the offline system, the Hugging Face model tries to reinstall itself, which fails since the offline environment cannot access the internet.

I noticed there is an "Upload" button, but I am unsure which ZIP file to upload to resolve this issue.

My question is: How can I import the Python environment with the Hugging Face model into the offline environment without requiring a new download?

Thank you in advance for your help!

Best regards,

Piteur


Operating system used: windows

Answers

  • Grixis
    Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 80 ✭✭✭✭✭

    Hello,

    I'm not sure I understand at what level you are unable to import your pre-trained models.

    If this is the model in your code environment you can do this from a script loading it into the resources directory like this: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html?_gl=1*1n85oae*_ga*MTc0MjQyNjcyOS4xNzE0OTE5NDEy*_ga_B3YXRYMY48*MTcxNjQ5NTQ5Ny4xMi4xLjE3MTY0OTU1MTUuNDIuMC4w

    Or if its in your code when you build, its not going to your loaded pretrained model folder. In this case, may need the HF_HOME environment variable which points to the folder where the models are stored. Then just call from_pretrained with the argument local_files_only = TRUE.

    If you are just stuck importing the right pre-trained model with its good tree structure for your offline, perhaps you can use the sample code scripts to import the model resources in zip to your local desktop and then just drop it into your env code with the upload function.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,067 Neuron

    The link posted by @Grixis
    is broken, here is the correct one: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html

    He is right to point out that you need to download the HF models and store them somewhere in your offline system. The Python code environment has nothing to do with that. While HF uses Python to download the models that doesn't mean the models are stored with the code environment, they are not.

  • MehdiH
    MehdiH Dataiker, Dataiku DSS Core Designer, Dataiku DSS Core Concepts Posts: 21 Dataiker
    edited July 17

    Hello Piteur,

    What you could do is zip the content of the code environment resources folder in your "online" system, and then upload it to your "offline" system (using the upload button).

    You would still need to set relevant environment variables in the resources initialization script in the "offline" system, but would skip the lines that perform the download of the pretrain model.

    In your "online" system, the code-env resources folder is located in

    <dss-home>/code-envs/resources/python/<code-env-name>/

    Best,

Setup Info
    Tags
      Help me…