Text Summarization Plugin Job Error - NLTK tokenizers are missing
Hi Dataiku community,
I am testing out the plugin 'Text Summarizations' and successfully built the code environment in a machine with no internet access . However, I am getting the following NLTK tokenizers error: I have actually manually upload the nltk resources using the Resources tab as shown below, however it still shows the same error. Can anyone advise?
Operating system used: Linux
Operating system used: Linux
Best Answer
-
victor_toh Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Partner
edit: Solved this issue.
The reason was due to double folder nltk creation.
Answers
-
Hi @victor_toh
,Did you set the NLTK_HOME environment variable for the resources directory?
The environment variable can be set by updating the code environment. You might need to comment out the last 2 lines like I did so that it doesn't try to connect to the internet.
Thanks,
Zach
-
victor_toh Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Partner
Hi Zach,
Yes I have set the environment variables and updated code env but the error still persists
-
Could you please check the file structure of the resources directory and make sure that it's set up correctly? It should look like this:
Resources directory/ âââ nltk_data âââ tokenizers âââ punkt â âââ PY3 â â âââ README â â âââ czech.pickle â â âââ danish.pickle â â âââ dutch.pickle â â âââ english.pickle â â âââ estonian.pickle â â âââ finnish.pickle â â âââ french.pickle â â âââ german.pickle â â âââ greek.pickle â â âââ italian.pickle â â âââ malayalam.pickle â â âââ norwegian.pickle â â âââ polish.pickle â â âââ portuguese.pickle â â âââ russian.pickle â â âââ slovene.pickle â â âââ spanish.pickle â â âââ swedish.pickle â â âââ turkish.pickle â âââ README â âââ czech.pickle â âââ danish.pickle â âââ dutch.pickle â âââ english.pickle â âââ estonian.pickle â âââ finnish.pickle â âââ french.pickle â âââ german.pickle â âââ greek.pickle â âââ italian.pickle â âââ malayalam.pickle â âââ norwegian.pickle â âââ polish.pickle â âââ portuguese.pickle â âââ russian.pickle â âââ slovene.pickle â âââ spanish.pickle â âââ swedish.pickle â âââ turkish.pickle âââ punkt.zip
Also, are you building the code environment for containerized execution, or is it just running locally?
Thanks,
Zach
-
victor_toh Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Partner
Hi Zach,
The folder structure seems to be correct.
My settings is as shown below, is this correct?