Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on May 22, 2019 9:48PM
Likes: 0
Replies: 3
I have a column where every entry is a user problem that I would like to analyse with NLP Machine learning. I keep getting this error:
Failed to train : <type 'exceptions.IOError'> : [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s9/pp1/countvec_Customer Verbatim / Issue Detail.pkl'
here are the logs (couldn't copy full thing):
[2019-05-22 11:16:13,333] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:RemapValueToOutput
[2019-05-22 11:16:13,346] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput
[2019-05-22 11:16:13,346] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {}
[2019-05-22 11:16:13,346] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(num_flagonly)
[2019-05-22 11:16:13,347] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Category)
[2019-05-22 11:16:13,364] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 16) nnz=56095
[2019-05-22 11:16:13,365] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Customer Type)
[2019-05-22 11:16:13,383] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 4) nnz=55798
[2019-05-22 11:16:13,384] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Potential Regulatory Theme)
[2019-05-22 11:16:13,402] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 102) nnz=55682
[2019-05-22 11:16:13,402] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Method of Contact)
[2019-05-22 11:16:13,420] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 11) nnz=56095
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {}
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(cat_flagpresence)
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH <class 'dataiku.doctor.preprocessing.dataframe_preprocessing.TextCountVectorizerProcessor'> (Customer Verbatim / Issue Detail)
[2019-05-22 11:16:13,423] [51597/MainThread] [INFO] [root] Using vectorizer: CountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
lowercase=True, max_df=0.7, max_features=None, min_df=0.001,
ngram_range=(3, 1), preprocessor=None,
stop_words=['m', 's', 'r', 've', 'd', 'tt', 'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', '...e', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too'],
strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',
tokenizer=None, vocabulary=None)
[2019-05-22 11:16:15,064] [51597/MainThread] [INFO] [root] Produced a matrix of size (56096, 1778)
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {}
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(interaction)
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:RealignTarget
[2019-05-22 11:16:15,070] [51597/MainThread] [INFO] [root] Realign target series = (56096,)
[2019-05-22 11:16:15,074] [51597/MainThread] [INFO] [root] After realign target: (56096,)
[2019-05-22 11:16:15,074] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:DropRowsWhereNoTarget
[2019-05-22 11:16:15,075] [51597/MainThread] [INFO] [root] Deleting 0 rows because no target
[2019-05-22 11:16:15,075] [51597/MainThread] [INFO] [root] MF before = (56096, 1911) target before = (56096,)
[2019-05-22 11:16:15,080] [51597/MainThread] [INFO] [root] MultiFrame, dropping rows: []
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] After DRWNT input_df=(56096, 21)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] MF after = (56096, 1911) target after = (56096,)
[2019-05-22 11:16:15,129] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:DumpPipelineState
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] ********* Pipieline state (Before feature selection)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] input_df= (56096, 21)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] current_mf=(56096, 1911)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] PPR:
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] target = <class 'pandas.core.series.Series'> ((56096,))
[2019-05-22 11:16:15,129] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:EmitCurrentMFAsResult
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] Set MF index len 56096
[2019-05-22 11:16:15,130] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:DumpPipelineState
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] ********* Pipieline state (At end)
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] input_df= (56096, 21)
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] current_mf=(0, 0)
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] PPR:
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] UNPROCESSED = <class 'pandas.core.frame.DataFrame'> ((56096, 21))
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] TRAIN = <class 'dataiku.doctor.multiframe.MultiFrame'> ((56096, 1911))
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] target = <class 'pandas.core.series.Series'> ((56096,))
[2019-05-22 11:16:15,131] [51597/MainThread] [INFO] [root] END - Preprocessing train set
Traceback (most recent call last):
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/server.py", line 47, in serve
ret = api_command(arg)
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/dkuapi.py", line 45, in aux
return api(**kwargs)
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/commands.py", line 259, in train_prediction_models_nosave
preproc_handler.save_data()
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/preprocessing_handler.py", line 165, in save_data
self._save_resource(resource_name)
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/preprocessing_handler.py", line 104, in _save_resource
with open(self._resource_filepath(resource_name, type), "wb") as resource_file:
IOError: [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s10/pp1/countvec_Customer Verbatim / Issue Detail.pkl'
[2019/05/22-11:16:15.137] [MRT-2523415] [INFO] [dku.block.link.interaction] - Check result for nullity exceptionIfNull=true result=null
[2019/05/22-11:16:15.365] [wrapper-stderr-2523439] [INFO] [dku.utils] - 2019-05-22 11:16:15,358 51573 INFO [Child] Process 51597 exited with exit=0 signal=0
[2019/05/22-11:16:15.365] [wrapper-stderr-2523439] [INFO] [dku.utils] - 2019-05-22 11:16:15,359 51573 INFO Full child code: 0
[2019/05/22-11:16:15.382] [KNL-python-single-command-kernel-monitor-2523444] [INFO] [dku.kernels] - Process done with code 0
[2019/05/22-11:16:15.383] [KNL-python-single-command-kernel-monitor-2523444] [INFO] [dip.tickets] - Destroying API ticket for analysis-ml-EUROPA-Wx3BdSd on behalf of gpaille
[2019/05/22-11:16:15.383] [MRT-2523415] [INFO] [dku.kernels] - Getting kernel tail
[2019/05/22-11:16:15.425] [MRT-2523415] [INFO] [dku.kernels] - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <type 'exceptions.IOError'> : [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s10/pp1/countvec_Customer Verbatim / Issue Detail.pkl' from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@704518a7 process=null pid=?? retcode=0
[2019/05/22-11:16:15.426] [MRT-2523415] [WARN] [dku.analysis.ml.python] - Training failed
com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <type 'exceptions.IOError'> : [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s10/pp1/countvec_Customer Verbatim / Issue Detail.pkl'
at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:298)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190)
at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:208)
at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:75)
at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:130)
[2019/05/22-11:16:15.436] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.ml.python] T-cnt5kb4d - Processing thread joined ...
[2019/05/22-11:16:15.436] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.ml.python] T-cnt5kb4d - Joining processing thread ...
[2019/05/22-11:16:15.437] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.ml.python] T-cnt5kb4d - Processing thread joined ...
[2019/05/22-11:16:15.437] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.prediction] T-cnt5kb4d - Train done
[2019/05/22-11:16:15.437] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.prediction] T-cnt5kb4d - Train done
[2019/05/22-11:16:15.442] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.prediction] T-cnt5kb4d - Publishing mltask-train-done reflected event