How to use spaCy models in DSS

Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
edited July 16 in Knowledge Base

Greetings fellow Linguists,

To use spaCy models in DSS, you can start by installing it like any other Python package in DSS: by creating a code environment and adding "spacy" to your package requirements. To do so, follow this documentation.

However, some functionalities of spaCy, such as language-specific tokenizers, rely on models that are not bundled in the library itself. To use these models, you need an additional download step. Typically, this can create issues on shared DSS nodes where users do not have write access to shared locations on the server (see User Isolation Framework).

To overcome this challenge, you can use spaCy dedicated pip delivery mechanism. For instance, in your code-env requirement setting, add


After adding this link, rebuild your code environment. To test that it works correctly, run the following code in a notebook using this code environment.

import spacy
nlp = spacy.load("en_core_web_sm")

Voila! You can now use spaCy along with its dedicated English language model.

Happy natural language processing!


  • Angelo
    Angelo Dataiku DSS Core Designer, Registered Posts: 23 ✭✭✭✭

    Thank you this has saved me so much time. I was trying to download the package using `!python -m spacy download en_core_web_sm` from a DSS notebook but run into all sorts of issues with the environments:

    - native/built_in python environment version was 2.7 while the environment i was using within DSS was 3.6

    - Within the DSS env I had installed spacy but when I was running the command above, it was invoking the python env outside dss so it couldn't find spacy.

    specifying the model package to be downloaded and installed through the pip instructions worked!

Setup Info
      Help me…