Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a column where every entry is a user problem that I would like to analyse with NLP Machine learning. I keep getting this error:
Failed to train : <type 'exceptions.IOError'> : [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s9/pp1/countvec_Customer Verbatim / Issue Detail.pkl'
here are the logs (couldn't copy full thing):
[2019-05-22 11:16:13,333] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:RemapValueToOutput
[2019-05-22 11:16:13,346] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput
[2019-05-22 11:16:13,346] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {}
[2019-05-22 11:16:13,346] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(num_flagonly)
[2019-05-22 11:16:13,347] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Category)
[2019-05-22 11:16:13,364] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 16) nnz=56095
[2019-05-22 11:16:13,365] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Customer Type)
[2019-05-22 11:16:13,383] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 4) nnz=55798
[2019-05-22 11:16:13,384] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Potential Regulatory Theme)
[2019-05-22 11:16:13,402] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 102) nnz=55682
[2019-05-22 11:16:13,402] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FastSparseDummifyProcessor (Method of Contact)
[2019-05-22 11:16:13,420] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] Dummifier: Append a sparse block shape=(56096, 11) nnz=56095
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {}
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(cat_flagpresence)
[2019-05-22 11:16:13,421] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH <class 'dataiku.doctor.preprocessing.dataframe_preprocessing.TextCountVectorizerProcessor'> (Customer Verbatim / Issue Detail)
[2019-05-22 11:16:13,423] [51597/MainThread] [INFO] [root] Using vectorizer: CountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
lowercase=True, max_df=0.7, max_features=None, min_df=0.001,
ngram_range=(3, 1), preprocessor=None,
stop_words=['m', 's', 'r', 've', 'd', 'tt', 'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', '...e', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too'],
strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',
tokenizer=None, vocabulary=None)
[2019-05-22 11:16:15,064] [51597/MainThread] [INFO] [root] Produced a matrix of size (56096, 1778)
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {}
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(interaction)
[2019-05-22 11:16:15,070] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:RealignTarget
[2019-05-22 11:16:15,070] [51597/MainThread] [INFO] [root] Realign target series = (56096,)
[2019-05-22 11:16:15,074] [51597/MainThread] [INFO] [root] After realign target: (56096,)
[2019-05-22 11:16:15,074] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:DropRowsWhereNoTarget
[2019-05-22 11:16:15,075] [51597/MainThread] [INFO] [root] Deleting 0 rows because no target
[2019-05-22 11:16:15,075] [51597/MainThread] [INFO] [root] MF before = (56096, 1911) target before = (56096,)
[2019-05-22 11:16:15,080] [51597/MainThread] [INFO] [root] MultiFrame, dropping rows: []
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] After DRWNT input_df=(56096, 21)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] MF after = (56096, 1911) target after = (56096,)
[2019-05-22 11:16:15,129] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:DumpPipelineState
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] ********* Pipieline state (Before feature selection)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] input_df= (56096, 21)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] current_mf=(56096, 1911)
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] PPR:
[2019-05-22 11:16:15,129] [51597/MainThread] [INFO] [root] target = <class 'pandas.core.series.Series'> ((56096,))
[2019-05-22 11:16:15,129] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:EmitCurrentMFAsResult
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] Set MF index len 56096
[2019-05-22 11:16:15,130] [51597/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:DumpPipelineState
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] ********* Pipieline state (At end)
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] input_df= (56096, 21)
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] current_mf=(0, 0)
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] PPR:
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] UNPROCESSED = <class 'pandas.core.frame.DataFrame'> ((56096, 21))
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] TRAIN = <class 'dataiku.doctor.multiframe.MultiFrame'> ((56096, 1911))
[2019-05-22 11:16:15,130] [51597/MainThread] [INFO] [root] target = <class 'pandas.core.series.Series'> ((56096,))
[2019-05-22 11:16:15,131] [51597/MainThread] [INFO] [root] END - Preprocessing train set
Traceback (most recent call last):
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/server.py", line 47, in serve
ret = api_command(arg)
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/dkuapi.py", line 45, in aux
return api(**kwargs)
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/commands.py", line 259, in train_prediction_models_nosave
preproc_handler.save_data()
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/preprocessing_handler.py", line 165, in save_data
self._save_resource(resource_name)
File "/apps/hadoop/data01/dataiku/dataiku-dss-5.0.2/python/dataiku/doctor/preprocessing_handler.py", line 104, in _save_resource
with open(self._resource_filepath(resource_name, type), "wb") as resource_file:
IOError: [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s10/pp1/countvec_Customer Verbatim / Issue Detail.pkl'
[2019/05/22-11:16:15.137] [MRT-2523415] [INFO] [dku.block.link.interaction] - Check result for nullity exceptionIfNull=true result=null
[2019/05/22-11:16:15.365] [wrapper-stderr-2523439] [INFO] [dku.utils] - 2019-05-22 11:16:15,358 51573 INFO [Child] Process 51597 exited with exit=0 signal=0
[2019/05/22-11:16:15.365] [wrapper-stderr-2523439] [INFO] [dku.utils] - 2019-05-22 11:16:15,359 51573 INFO Full child code: 0
[2019/05/22-11:16:15.382] [KNL-python-single-command-kernel-monitor-2523444] [INFO] [dku.kernels] - Process done with code 0
[2019/05/22-11:16:15.383] [KNL-python-single-command-kernel-monitor-2523444] [INFO] [dip.tickets] - Destroying API ticket for analysis-ml-EUROPA-Wx3BdSd on behalf of gpaille
[2019/05/22-11:16:15.383] [MRT-2523415] [INFO] [dku.kernels] - Getting kernel tail
[2019/05/22-11:16:15.425] [MRT-2523415] [INFO] [dku.kernels] - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <type 'exceptions.IOError'> : [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s10/pp1/countvec_Customer Verbatim / Issue Detail.pkl' from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@704518a7 process=null pid=?? retcode=0
[2019/05/22-11:16:15.426] [MRT-2523415] [WARN] [dku.analysis.ml.python] - Training failed
com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <type 'exceptions.IOError'> : [Errno 2] No such file or directory: u'/apps/hadoop/data01/dataiku/data_dir/analysis-data/EUROPA/FQ9JmSC7/cnt5kb4d/sessions/s10/pp1/countvec_Customer Verbatim / Issue Detail.pkl'
at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:298)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190)
at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:208)
at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:75)
at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:130)
[2019/05/22-11:16:15.436] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.ml.python] T-cnt5kb4d - Processing thread joined ...
[2019/05/22-11:16:15.436] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.ml.python] T-cnt5kb4d - Joining processing thread ...
[2019/05/22-11:16:15.437] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.ml.python] T-cnt5kb4d - Processing thread joined ...
[2019/05/22-11:16:15.437] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.prediction] T-cnt5kb4d - Train done
[2019/05/22-11:16:15.437] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.prediction] T-cnt5kb4d - Train done
[2019/05/22-11:16:15.442] [FT-TrainWorkThread-9rsHhxAF-2523414] [INFO] [dku.analysis.prediction] T-cnt5kb4d - Publishing mltask-train-done reflected event