Text Summarization Plugin Job Error - NLTK tokenizers are missing

Solved!
victor_toh
Level 2
Text Summarization Plugin Job Error - NLTK tokenizers are missing

Hi Dataiku community,

 

I am testing out the plugin 'Text Summarizations' and successfully built the code environment in a machine with no internet access . However, I am getting the following NLTK tokenizers error:nltk.png I have actually manually upload the nltk resources using the Resources tab as shown below, however it still shows the same error. Can anyone advise? 

 

 

nltk2.png


Operating system used: Linux


Operating system used: Linux

0 Kudos
1 Solution
victor_toh
Level 2
Author

edit: Solved this issue.

 

The reason was due to double folder nltk creation.

nltk.png

View solution in original post

0 Kudos
5 Replies
ZachM
Dataiker

Hi @victor_toh,

Did you set the NLTK_HOME environment variable for the resources directory?

D5E4B4C3-95C0-49C2-81ED-5530AFA68B78_1_201_a.jpeg

 

The environment variable can be set by updating the code environment. You might need to comment out the last 2 lines like I did so that it doesn't try to connect to the internet.

 

Thanks,

Zach

 

0 Kudos
victor_toh
Level 2
Author

Hi Zach,

 

Yes I have set the environment variables and updated code env but the error still persists

0 Kudos
ZachM
Dataiker

Could you please check the file structure of the resources directory and make sure that it's set up correctly? It should look like this:

Resources directory/
└── nltk_data
    └── tokenizers
        ├── punkt
        │   ├── PY3
        │   │   ├── README
        │   │   ├── czech.pickle
        │   │   ├── danish.pickle
        │   │   ├── dutch.pickle
        │   │   ├── english.pickle
        │   │   ├── estonian.pickle
        │   │   ├── finnish.pickle
        │   │   ├── french.pickle
        │   │   ├── german.pickle
        │   │   ├── greek.pickle
        │   │   ├── italian.pickle
        │   │   ├── malayalam.pickle
        │   │   ├── norwegian.pickle
        │   │   ├── polish.pickle
        │   │   ├── portuguese.pickle
        │   │   ├── russian.pickle
        │   │   ├── slovene.pickle
        │   │   ├── spanish.pickle
        │   │   ├── swedish.pickle
        │   │   └── turkish.pickle
        │   ├── README
        │   ├── czech.pickle
        │   ├── danish.pickle
        │   ├── dutch.pickle
        │   ├── english.pickle
        │   ├── estonian.pickle
        │   ├── finnish.pickle
        │   ├── french.pickle
        │   ├── german.pickle
        │   ├── greek.pickle
        │   ├── italian.pickle
        │   ├── malayalam.pickle
        │   ├── norwegian.pickle
        │   ├── polish.pickle
        │   ├── portuguese.pickle
        │   ├── russian.pickle
        │   ├── slovene.pickle
        │   ├── spanish.pickle
        │   ├── swedish.pickle
        │   └── turkish.pickle
        └── punkt.zip

 

Also, are you building the code environment for containerized execution, or is it just running locally?

 

Thanks,

Zach

0 Kudos
victor_toh
Level 2
Author

Hi Zach,

The folder structure seems to be correct.

My settings is as shown below, is this correct?

setting.png

0 Kudos
victor_toh
Level 2
Author

edit: Solved this issue.

 

The reason was due to double folder nltk creation.

nltk.png

0 Kudos

Setup info

?
Tags (1)
A banner prompting to get Dataiku