Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

nlp preparation plugin issue

Level 3
nlp preparation plugin issue

Hi All,

i am trying to install Text Preparation and facing the error while downloading the package sudachidict-core>=20200330

error log : 

ERROR: Command errored out with exit status 1:
command: /datadir/dataiku/DATA_DSSUSER/code-envs/python/plugin_nlp-preparation_managed/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-8ayrifsm/'"'"'; __file__='"'"'/tmp/pip-req-build-8ayrifsm/'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code ='"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-aqr3pvyn
cwd: /tmp/pip-req-build-8ayrifsm/
Complete output (43 lines):
Downloading the Sudachi dictionary (It may take a while) ...
Traceback (most recent call last):
File "/usr/lib64/python3.6/urllib/", line 1349, in do_open
File "/usr/lib64/python3.6/http/", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib64/python3.6/http/", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/", line 1036, in _send_output
File "/usr/lib64/python3.6/http/", line 974, in send
File "/usr/lib64/python3.6/http/", line 946, in connect
(,self.port), self.timeout, self.source_address)
File "/usr/lib64/python3.6/", line 724, in create_connection
raise err
File "/usr/lib64/python3.6/", line 713, in create_connection
TimeoutError: [Errno 110] Connection timed out


During handling of the above exception, another exception occurred:


Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-8ayrifsm/", line 44, in <module>
_, _msg = urlretrieve(ZIP_URL, ZIP_NAME)
File "/usr/lib64/python3.6/urllib/", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib64/python3.6/urllib/", line 223, in urlopen
return, data, timeout)
File "/usr/lib64/python3.6/urllib/", line 526, in open
response = self._open(req, data)
File "/usr/lib64/python3.6/urllib/", line 544, in _open
'_open', req)
File "/usr/lib64/python3.6/urllib/", line 504, in _call_chain
result = func(*args)
File "/usr/lib64/python3.6/urllib/", line 1377, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib64/python3.6/urllib/", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
WARNING: Discarding file:///datadir/dataiku/DATA_DSSUSER/SudachiDict-core-20220729.tar.gz. Command errored out with exit status 1: python egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python egg_info Check the logs for full command output.


can some one please suggest is there any other way to download the  packages or complete this plugin installation

Operating system used: Linux

0 Kudos
1 Reply


sudachidict-core requires specific S3 access which your network team will need to allow.

The "Connection timed out" suggests this is not currently allowed. 

the URL will be something like :  

If allowing this is not possible or desirable. 

You may be able to workaround this if you don't support the Japanese in your case,  you can convert the plugin to dev and change the line (in nlp-preparation/code-env/python/spec/requirements.txt) 
spacy[lookups,ja,th] ->  
spacy[lookups,th] and try to reinstall the plugin code env.