Text Summarization Plugin Job Error - NLTK tokenizers are missing

victor_toh
victor_toh Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Partner

Hi Dataiku community,

I am testing out the plugin 'Text Summarizations' and successfully built the code environment in a machine with no internet access . However, I am getting the following NLTK tokenizers error:nltk.png I have actually manually upload the nltk resources using the Resources tab as shown below, however it still shows the same error. Can anyone advise?

nltk2.png


Operating system used: Linux


Operating system used: Linux

Tagged:

Best Answer

Answers

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker

    Hi @victor_toh
    ,

    Did you set the NLTK_HOME environment variable for the resources directory?

    D5E4B4C3-95C0-49C2-81ED-5530AFA68B78_1_201_a.jpeg

    The environment variable can be set by updating the code environment. You might need to comment out the last 2 lines like I did so that it doesn't try to connect to the internet.

    Thanks,

    Zach

  • victor_toh
    victor_toh Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Partner

    Hi Zach,

    Yes I have set the environment variables and updated code env but the error still persists

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    edited July 17

    Could you please check the file structure of the resources directory and make sure that it's set up correctly? It should look like this:

    Resources directory/
    └── nltk_data
        └── tokenizers
            ├── punkt
            │   ├── PY3
            │   │   ├── README
            │   │   ├── czech.pickle
            │   │   ├── danish.pickle
            │   │   ├── dutch.pickle
            │   │   ├── english.pickle
            │   │   ├── estonian.pickle
            │   │   ├── finnish.pickle
            │   │   ├── french.pickle
            │   │   ├── german.pickle
            │   │   ├── greek.pickle
            │   │   ├── italian.pickle
            │   │   ├── malayalam.pickle
            │   │   ├── norwegian.pickle
            │   │   ├── polish.pickle
            │   │   ├── portuguese.pickle
            │   │   ├── russian.pickle
            │   │   ├── slovene.pickle
            │   │   ├── spanish.pickle
            │   │   ├── swedish.pickle
            │   │   └── turkish.pickle
            │   ├── README
            │   ├── czech.pickle
            │   ├── danish.pickle
            │   ├── dutch.pickle
            │   ├── english.pickle
            │   ├── estonian.pickle
            │   ├── finnish.pickle
            │   ├── french.pickle
            │   ├── german.pickle
            │   ├── greek.pickle
            │   ├── italian.pickle
            │   ├── malayalam.pickle
            │   ├── norwegian.pickle
            │   ├── polish.pickle
            │   ├── portuguese.pickle
            │   ├── russian.pickle
            │   ├── slovene.pickle
            │   ├── spanish.pickle
            │   ├── swedish.pickle
            │   └── turkish.pickle
            └── punkt.zip

    Also, are you building the code environment for containerized execution, or is it just running locally?

    Thanks,

    Zach

  • victor_toh
    victor_toh Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Partner

    Hi Zach,

    The folder structure seems to be correct.

    My settings is as shown below, is this correct?

    setting.png

Setup Info
    Tags
      Help me…