Text Summarization Plugin Job Error - NLTK tokenizers are missing

Solved!
victor_toh
Level 2
Text Summarization Plugin Job Error - NLTK tokenizers are missing

Hi Dataiku community,

 

I am testing out the plugin 'Text Summarizations' and successfully built the code environment in a machine with no internet access . However, I am getting the following NLTK tokenizers error:nltk.png I have actually manually upload the nltk resources using the Resources tab as shown below, however it still shows the same error. Can anyone advise? 

 

 

nltk2.png


Operating system used: Linux


Operating system used: Linux

0 Kudos
1 Solution
victor_toh
Level 2
Author

edit: Solved this issue.

 

The reason was due to double folder nltk creation.

nltk.png

View solution in original post

0 Kudos
5 Replies
ZachM
Dataiker

Hi @victor_toh,

Did you set the NLTK_HOME environment variable for the resources directory?

D5E4B4C3-95C0-49C2-81ED-5530AFA68B78_1_201_a.jpeg

 

The environment variable can be set by updating the code environment. You might need to comment out the last 2 lines like I did so that it doesn't try to connect to the internet.

 

Thanks,

Zach

 

0 Kudos
victor_toh
Level 2
Author

Hi Zach,

 

Yes I have set the environment variables and updated code env but the error still persists

0 Kudos
ZachM
Dataiker

Could you please check the file structure of the resources directory and make sure that it's set up correctly? It should look like this:

Resources directory/
โ””โ”€โ”€ nltk_data
    โ””โ”€โ”€ tokenizers
        โ”œโ”€โ”€ punkt
        โ”‚   โ”œโ”€โ”€ PY3
        โ”‚   โ”‚   โ”œโ”€โ”€ README
        โ”‚   โ”‚   โ”œโ”€โ”€ czech.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ danish.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ dutch.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ english.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ estonian.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ finnish.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ french.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ german.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ greek.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ italian.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ malayalam.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ norwegian.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ polish.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ portuguese.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ russian.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ slovene.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ spanish.pickle
        โ”‚   โ”‚   โ”œโ”€โ”€ swedish.pickle
        โ”‚   โ”‚   โ””โ”€โ”€ turkish.pickle
        โ”‚   โ”œโ”€โ”€ README
        โ”‚   โ”œโ”€โ”€ czech.pickle
        โ”‚   โ”œโ”€โ”€ danish.pickle
        โ”‚   โ”œโ”€โ”€ dutch.pickle
        โ”‚   โ”œโ”€โ”€ english.pickle
        โ”‚   โ”œโ”€โ”€ estonian.pickle
        โ”‚   โ”œโ”€โ”€ finnish.pickle
        โ”‚   โ”œโ”€โ”€ french.pickle
        โ”‚   โ”œโ”€โ”€ german.pickle
        โ”‚   โ”œโ”€โ”€ greek.pickle
        โ”‚   โ”œโ”€โ”€ italian.pickle
        โ”‚   โ”œโ”€โ”€ malayalam.pickle
        โ”‚   โ”œโ”€โ”€ norwegian.pickle
        โ”‚   โ”œโ”€โ”€ polish.pickle
        โ”‚   โ”œโ”€โ”€ portuguese.pickle
        โ”‚   โ”œโ”€โ”€ russian.pickle
        โ”‚   โ”œโ”€โ”€ slovene.pickle
        โ”‚   โ”œโ”€โ”€ spanish.pickle
        โ”‚   โ”œโ”€โ”€ swedish.pickle
        โ”‚   โ””โ”€โ”€ turkish.pickle
        โ””โ”€โ”€ punkt.zip

 

Also, are you building the code environment for containerized execution, or is it just running locally?

 

Thanks,

Zach

0 Kudos
victor_toh
Level 2
Author

Hi Zach,

The folder structure seems to be correct.

My settings is as shown below, is this correct?

setting.png

0 Kudos
victor_toh
Level 2
Author

edit: Solved this issue.

 

The reason was due to double folder nltk creation.

nltk.png

0 Kudos

Setup info

?
Tags (1)
A banner prompting to get Dataiku