[2020/10/19-12:49:26.514] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.prediction] - ****************************************** [2020/10/19-12:49:26.517] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.prediction] - ** Start train session s1 [2020/10/19-12:49:26.518] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.prediction] - ****************************************** [2020/10/19-12:49:26.637] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.shaker.data] T-Jq2My3Ar - [ct: 124] Need to compute sampleId before checking memory cache [2020/10/19-12:49:26.640] [FT-TrainWorkThread-8wQayRe8-168] [DEBUG] [dip.shaker.runner] T-Jq2My3Ar - [ct: 127] Script settings sampleMax=104857600 processedMax=-1 [2020/10/19-12:49:26.641] [FT-TrainWorkThread-8wQayRe8-168] [DEBUG] [dip.shaker.runner] T-Jq2My3Ar - [ct: 128] Processing with sampleMax=104857600 processedMax=524288000 [2020/10/19-12:49:26.645] [FT-TrainWorkThread-8wQayRe8-168] [DEBUG] [dip.shaker.runner] T-Jq2My3Ar - [ct: 132] Computed required sample id : 4a128065123f1e8bb1249371936fffec-NA-b9689059e8435a4efb7ef594c85e26650--d751713988987e9331980363e24189ce [2020/10/19-12:49:26.647] [FT-TrainWorkThread-8wQayRe8-168] [DEBUG] [dku.shaker.cache] T-Jq2My3Ar - Shaker MemoryCache get on dataset DKU_TUTORIAL_NLP_VISUAL.IMDB_train_prepared key=ds=c64b36ca92d69952fcb51deeb20330ed--scr=fe0aded6b7560b54e0ca67f3477ef431--samp=4a128065123f1e8bb1249371936fffec-NA-b9689059e8435a4efb7ef594c85e26650--d751713988987e9331980363e24189ce: hit [2020/10/19-12:49:26.673] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.shaker.schema] T-Jq2My3Ar - [ct: 160] Column text meaning=FreeText fail=0 [2020/10/19-12:49:26.674] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.shaker.schema] T-Jq2My3Ar - [ct: 161] Column length meaning=LongMeaning fail=0 [2020/10/19-12:49:26.674] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.shaker.schema] T-Jq2My3Ar - [ct: 161] Column polarity meaning=LongMeaning fail=0 [2020/10/19-12:49:27.037] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.datasets.file] T-Jq2My3Ar - [ct: 524] Building Filesystem handler config: {"connection":"filesystem_managed","path":"DKU_TUTORIAL_NLP_VISUAL/IMDB_train_prepared","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2020/10/19-12:49:27.177] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.datasets.ftplike] T-Jq2My3Ar - Enumerating Filesystem dataset prefix= [2020/10/19-12:49:27.196] [FT-TrainWorkThread-8wQayRe8-168] [DEBUG] [dku.fs.local] T-Jq2My3Ar - [ct: 683] Enumerating local filesystem prefix=/ [2020/10/19-12:49:27.204] [FT-TrainWorkThread-8wQayRe8-168] [DEBUG] [dku.fs.local] T-Jq2My3Ar - [ct: 691] Enumeration done nb_paths=1 size=7599792 [2020/10/19-12:49:27.205] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.input.push] T-Jq2My3Ar - USTP: push selection.method=HEAD_SEQUENTIAL records=100000 ratio=0.02 col=null [2020/10/19-12:49:27.209] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.format] T-Jq2My3Ar - [ct: 696] Extractor run: limit={"maxBytes":-1,"maxRecords":100000,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2020/10/19-12:49:27.240] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku] T-Jq2My3Ar - getCompression filename=**out-s0.csv.gz** [2020/10/19-12:49:27.243] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku] T-Jq2My3Ar - getCompression filename=**out-s0.csv.gz** [2020/10/19-12:49:27.244] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.format] T-Jq2My3Ar - [ct: 731] Start compressed [GZIP] stream: /home/dataiku/dss/managed_datasets/DKU_TUTORIAL_NLP_VISUAL/IMDB_train_prepared/out-s0.csv.gz / totalRecsBefore=0 [2020/10/19-12:49:27.244] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku] T-Jq2My3Ar - getCompression filename=**out-s0.csv.gz** [2020/10/19-12:49:27.250] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku] T-Jq2My3Ar - getCompression filename=**out-s0.csv.gz** [2020/10/19-12:49:37.413] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.format] T-Jq2My3Ar - [ct: 10900] after stream totalComp=7599792 totalUncomp=18980290 totalRec=25000 [2020/10/19-12:49:37.413] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.format] T-Jq2My3Ar - [ct: 10900] Extractor run done, totalCompressed=7599792 totalRecords=25000 [2020/10/19-12:49:37.421] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.splits] T-Jq2My3Ar - [ct: 10908] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=IMDB_train_prepared,sel=(method=head-s,records=100000),r=0.8,s=1337, instance id: dfce45b82657289905253de72a3023b4-0 [2020/10/19-12:49:37.422] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.splits] T-Jq2My3Ar - [ct: 10909] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=IMDB_train_prepared,sel=(method=head-s,records=100000),r=0.8,s=1337 i=dfce45b82657289905253de72a3023b4-0 [2020/10/19-12:49:37.424] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.splits] T-Jq2My3Ar - [ct: 10911] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=IMDB_train_prepared,sel=(method=head-s,records=100000),r=0.8,s=1337 i=dfce45b82657289905253de72a3023b4-0 [2020/10/19-12:49:37.482] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.ml.python] T-Jq2My3Ar - Joining processing thread ... [2020/10/19-12:49:37.487] [MRT-169] [INFO] [dku.analysis.ml.python] - Running a preprocessing set: pp1 in /home/dataiku/dss/analysis-data/DKU_TUTORIAL_NLP_VISUAL/9pSKHeiU/Jq2My3Ar/sessions/s1/pp1 [2020/10/19-12:49:37.499] [MRT-169] [INFO] [dku.block.link] - Started a socket on port 33193 [2020/10/19-12:49:37.511] [MRT-169] [INFO] [dku.ml.kernel] - Writing output of python-single-command-kernel to /home/dataiku/dss/analysis-data/DKU_TUTORIAL_NLP_VISUAL/9pSKHeiU/Jq2My3Ar/sessions/s1/pp1/train.log [2020/10/19-12:49:37.533] [MRT-169] [INFO] [dku.code.envs.resolution] - Executing Python activity in builtin env [2020/10/19-12:49:37.535] [MRT-169] [WARN] [dku.code.projectLibs] - External libraries file not found: /home/dataiku/dss/config/projects/DKU_TUTORIAL_NLP_VISUAL/lib/external-libraries.json [2020/10/19-12:49:37.536] [MRT-169] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM DKU_TUTORIAL_NLP_VISUAL is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]} [2020/10/19-12:49:37.537] [MRT-169] [INFO] [dku.code.projectLibs] - chunkFolder is /home/dataiku/dss/config/projects/DKU_TUTORIAL_NLP_VISUAL/lib/R [2020/10/19-12:49:37.538] [MRT-169] [INFO] [dku.python.single_command.kernel] - Starting Python process for kernel python-single-command-kernel [2020/10/19-12:49:37.538] [MRT-169] [INFO] [dip.tickets] - Creating API ticket for analysis-ml-DKU_TUTORIAL_NLP_VISUAL-D0qL1RE on behalf of admin id=analysis-ml-DKU_TUTORIAL_NLP_VISUAL-D0qL1RE_58RCwblvW20V [2020/10/19-12:49:37.539] [MRT-169] [INFO] [dku.security.process] - Starting process (regular) [2020/10/19-12:49:37.648] [MRT-169] [INFO] [dku.security.process] - Process started with pid=1955 [2020/10/19-12:49:37.651] [MRT-169] [INFO] [dku.processes.cgroups] - Will use cgroups [] [2020/10/19-12:49:37.652] [MRT-169] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: [] [2020/10/19-12:49:37.768] [KNL-python-single-command-kernel-monitor-176] [INFO] [dku.resourceusage] - Reporting start of CRU:{"context":{"type":"ANALYSIS_ML_TRAIN","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_NLP_VISUAL","analysisId":"9pSKHeiU","mlTaskId":"Jq2My3Ar","sessionId":"s1"},"type":"LOCAL_PROCESS","id":"M8qpvT3aItV6PMmG","startTime":1603111777765,"localProcess":{"cpuCurrent":0.0}} [2020/10/19-12:49:37.950] [process-resource-monitor-1955-179] [DEBUG] [dku.resource] - Process stats for pid 1955: {"pid":1955,"commandName":"/home/dataiku/dss/bin/python","cpuUserTimeMS":0,"cpuSystemTimeMS":110,"cpuChildrenUserTimeMS":0,"cpuChildrenSystemTimeMS":0,"cpuTotalMS":110,"cpuCurrent":0.0,"vmSizeMB":89,"vmRSSMB":8,"vmHWMMB":8,"vmRSSAnonMB":4,"vmDataMB":4,"vmSizePeakMB":89,"vmRSSPeakMB":8,"vmRSSTotalMBS":0,"majorFaults":0,"childrenMajorFaults":0} Installing debugging signal handler [2020-10-19 12:49:42,615] [1955/MainThread] [INFO] [root] Connecting to parent at port 33193 [2020/10/19-12:49:42.618] [MRT-169] [INFO] [dku.link.secret_protected] - Connected to kernel [2020/10/19-12:49:42.625] [MRT-169] [INFO] [dku.block.link.interaction] - Execute link command respClazz=true respTypeToken=false respIsString=false is=false asyncInputStream=false os=false [2020-10-19 12:49:42,618] [1955/MainThread] [INFO] [root] Connected to parent at port 33193 [2020-10-19 12:49:46,431] [1955/MainThread] [INFO] [root] Running analysis command: train_prediction_models_nosave [2020-10-19 12:49:46,432] [1955/MainThread] [INFO] [dataiku.doctor.commands] PPS is {u'preprocessingFitSampleSeed': 1337, u'feature_selection_params': {u'custom_params': {u'code': u'# type your code here'}, u'pca_params': {u'variance_proportion': 0.9, u'n_features': 25}, u'random_forest_params': {u'depth': 10, u'n_features': 25, u'n_trees': 30}, u'lasso_params': {u'alpha': [0.01, 0.1, 1.0, 10.0, 100.0], u'cross_validate': True}, u'method': u'NONE', u'correlation_params': {u'max_abs_correlation': 1.0, u'n_features': 25, u'min_abs_correlation': 0.0}}, u'preprocessingFitSampleRatio': 1.0, u'reduce': {u'enabled': False, u'kept_variance': 0.0}, u'skipPreprocessing': False, u'target_remapping': [{u'mappedValue': 0, u'sourceValue': u'0', u'sampleFreq': 4934}, {u'mappedValue': 1, u'sourceValue': u'1', u'sampleFreq': 5066}], u'per_feature': {u'polarity': {u'generate_derivative': False, u'customHandlingCode': u'', u'customProcessorWantsMatrix': False, u'sendToInput': u'main', u'binarize_threshold_mode': u'MEDIAN', u'state': {u'userModified': False, u'recordedMeaning': u'LongMeaning', u'autoModifiedByDSS': False}, u'role': u'TARGET', u'binarize_constant_threshold': 0.0, u'quantile_bin_nb_bins': 4, u'type': u'NUMERIC', u'impute_constant_value': 0.0}, u'text': {u'autoReason': u'REJECT_DEFAULT_TEXT_HANDLING', u'useCustomVectorizer': False, u'customHandlingCode': u'', u'hashSVDSVDLimit': 50000, u'customProcessorWantsMatrix': False, u'hashSVDSVDComponents': 100, u'sendToInput': u'main', u'maxWords': 0, u'state': {u'userModified': False, u'recordedMeaning': u'FreeText', u'autoModifiedByDSS': False}, u'hashSize': 200000, u'ngramMaxSize': 1, u'text_handling': u'TOKENIZE_HASHING_SVD', u'ngramMinSize': 1, u'stopWordsMode': u'NONE', u'minRowsRatio': 0.001, u'role': u'INPUT', u'type': u'TEXT', u'maxRowsRatio': 0.8, u'name': u'text'}, u'length': {u'generate_derivative': False, u'sendToInput': u'main', u'rescaling': u'AVGSTD', u'role': u'INPUT', u'customHandlingCode': u'', u'customProcessorWantsMatrix': False, u'numerical_handling': u'REGULAR', u'binarize_threshold_mode': u'MEDIAN', u'state': {u'userModified': False, u'recordedMeaning': u'LongMeaning', u'autoModifiedByDSS': False}, u'missing_handling': u'IMPUTE', u'binarize_constant_threshold': 0.0, u'quantile_bin_nb_bins': 4, u'missing_impute_with': u'MEAN', u'type': u'NUMERIC', u'impute_constant_value': 0.0}}, u'feature_generation': {u'manual_interactions': {u'interactions': []}, u'pairwise_linear': {u'behavior': u'DISABLED'}, u'categoricals_count_transformer': {u'input_features': [], u'all_features': False, u'behavior': u'DISABLED'}, u'polynomial_combinations': {u'behavior': u'DISABLED'}, u'numericals_clustering': {u'k': 0, u'input_features': [], u'all_features': False, u'behavior': u'DISABLED'}}} [2020-10-19 12:49:46,432] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] START - Loading train set [2020-10-19 12:49:46,433] [1955/MainThread] [INFO] [root] Reading with dtypes: None [2020-10-19 12:49:46,434] [1955/MainThread] [INFO] [dataiku.doctor.utils] Computed dtype for text: str (schema_type=string feature_type=TEXT feature_role=INPUT) [2020-10-19 12:49:46,434] [1955/MainThread] [INFO] [dataiku.doctor.utils] Computed dtype for length: (schema_type=bigint feature_type=NUMERIC feature_role=INPUT) [2020-10-19 12:49:46,434] [1955/MainThread] [INFO] [dataiku.doctor.utils] Computed dtype for polarity: (schema_type=bigint feature_type=NUMERIC feature_role=TARGET) [2020-10-19 12:49:46,434] [1955/MainThread] [INFO] [root] Reading with FIXED dtypes: {u'polarity': , u'text': 'str', u'length': } [2020-10-19 12:49:47,808] [1955/MainThread] [INFO] [root] Loaded table [2020-10-19 12:49:47,808] [1955/MainThread] [INFO] [dataiku.doctor.utils] Coercion done [2020-10-19 12:49:47,808] [1955/MainThread] [INFO] [dataiku.doctor.commands] Loaded train df: shape=(20009,3) [2020-10-19 12:49:47,808] [1955/MainThread] [INFO] [dataiku.doctor.commands] Train col : text (object) [2020-10-19 12:49:47,808] [1955/MainThread] [INFO] [dataiku.doctor.commands] Train col : length (float64) [2020-10-19 12:49:47,857] [1955/MainThread] [INFO] [dataiku.doctor.commands] Train col : polarity (object) [2020-10-19 12:49:47,857] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] END - Loading train set [2020-10-19 12:49:47,857] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] START - Loading test set [2020-10-19 12:49:47,859] [1955/MainThread] [INFO] [root] Reading with dtypes: None [2020-10-19 12:49:47,859] [1955/MainThread] [INFO] [dataiku.doctor.utils] Computed dtype for text: str (schema_type=string feature_type=TEXT feature_role=INPUT) [2020-10-19 12:49:47,859] [1955/MainThread] [INFO] [dataiku.doctor.utils] Computed dtype for length: (schema_type=bigint feature_type=NUMERIC feature_role=INPUT) [2020-10-19 12:49:47,859] [1955/MainThread] [INFO] [dataiku.doctor.utils] Computed dtype for polarity: (schema_type=bigint feature_type=NUMERIC feature_role=TARGET) [2020-10-19 12:49:47,859] [1955/MainThread] [INFO] [root] Reading with FIXED dtypes: {u'polarity': , u'text': 'str', u'length': } [2020-10-19 12:49:47,995] [1955/MainThread] [INFO] [root] Loaded table [2020-10-19 12:49:47,995] [1955/MainThread] [INFO] [dataiku.doctor.utils] Coercion done [2020-10-19 12:49:47,995] [1955/MainThread] [INFO] [dataiku.doctor.commands] Loaded test df: shape=(4991,3) [2020-10-19 12:49:47,995] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] END - Loading test set [2020-10-19 12:49:47,995] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] START - Collecting statistics [2020-10-19 12:49:48,022] [1955/MainThread] [INFO] [dataiku.doctor.preprocessing_collector] Looking at polarity... (type=NUMERIC) [2020-10-19 12:49:48,022] [1955/MainThread] [INFO] [dataiku.doctor.preprocessing_collector] Looking at text... (type=TEXT) [2020-10-19 12:49:48,022] [1955/MainThread] [INFO] [dataiku.doctor.preprocessing_collector] Looking at length... (type=NUMERIC) [2020-10-19 12:49:48,023] [1955/MainThread] [INFO] [dataiku.doctor.preprocessing_collector] Checking series of type: float64 (isM8=False) [2020-10-19 12:49:48,025] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] END - Collecting statistics [2020-10-19 12:49:48,026] [1955/MainThread] [INFO] [dataiku.doctor.multiframe] generating interactions [2020-10-19 12:49:48,026] [1955/MainThread] [INFO] [dataiku.doctor.multiframe] {u'preprocessingFitSampleSeed': 1337, u'feature_selection_params': {u'custom_params': {u'code': u'# type your code here'}, u'pca_params': {u'variance_proportion': 0.9, u'n_features': 25}, u'random_forest_params': {u'depth': 10, u'n_features': 25, u'n_trees': 30}, u'lasso_params': {u'alpha': [0.01, 0.1, 1.0, 10.0, 100.0], u'cross_validate': True}, u'method': u'NONE', u'correlation_params': {u'max_abs_correlation': 1.0, u'n_features': 25, u'min_abs_correlation': 0.0}}, u'preprocessingFitSampleRatio': 1.0, u'reduce': {u'enabled': False, u'kept_variance': 0.0}, u'skipPreprocessing': False, u'target_remapping': [{u'mappedValue': 0, u'sourceValue': u'0', u'sampleFreq': 4934}, {u'mappedValue': 1, u'sourceValue': u'1', u'sampleFreq': 5066}], u'per_feature': {u'polarity': {u'generate_derivative': False, u'customHandlingCode': u'', u'customProcessorWantsMatrix': False, u'sendToInput': u'main', u'binarize_threshold_mode': u'MEDIAN', u'state': {u'userModified': False, u'recordedMeaning': u'LongMeaning', u'autoModifiedByDSS': False}, u'role': u'TARGET', u'binarize_constant_threshold': 0.0, u'quantile_bin_nb_bins': 4, u'type': u'NUMERIC', u'impute_constant_value': 0.0}, u'text': {u'autoReason': u'REJECT_DEFAULT_TEXT_HANDLING', u'useCustomVectorizer': False, u'customHandlingCode': u'', u'hashSVDSVDLimit': 50000, u'customProcessorWantsMatrix': False, u'hashSVDSVDComponents': 100, u'sendToInput': u'main', u'maxWords': 0, u'state': {u'userModified': False, u'recordedMeaning': u'FreeText', u'autoModifiedByDSS': False}, u'hashSize': 200000, u'ngramMaxSize': 1, u'text_handling': u'TOKENIZE_HASHING_SVD', u'ngramMinSize': 1, u'stopWordsMode': u'NONE', u'minRowsRatio': 0.001, u'role': u'INPUT', u'type': u'TEXT', u'maxRowsRatio': 0.8, u'name': u'text'}, u'length': {u'generate_derivative': False, u'sendToInput': u'main', u'rescaling': u'AVGSTD', u'role': u'INPUT', u'customHandlingCode': u'', u'customProcessorWantsMatrix': False, u'numerical_handling': u'REGULAR', u'binarize_threshold_mode': u'MEDIAN', u'state': {u'userModified': False, u'recordedMeaning': u'LongMeaning', u'autoModifiedByDSS': False}, u'missing_handling': u'IMPUTE', u'binarize_constant_threshold': 0.0, u'quantile_bin_nb_bins': 4, u'missing_impute_with': u'MEAN', u'type': u'NUMERIC', u'impute_constant_value': 0.0}}, u'feature_generation': {u'manual_interactions': {u'interactions': []}, u'pairwise_linear': {u'behavior': u'DISABLED'}, u'categoricals_count_transformer': {u'input_features': [], u'all_features': False, u'behavior': u'DISABLED'}, u'polynomial_combinations': {u'behavior': u'DISABLED'}, u'numericals_clustering': {u'k': 0, u'input_features': [], u'all_features': False, u'behavior': u'DISABLED'}}} [2020-10-19 12:49:48,026] [1955/MainThread] [INFO] [dataiku.doctor.multiframe] No feature selection to perform [2020-10-19 12:49:48,026] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] START - Preprocessing train set [2020-10-19 12:49:48,028] [1955/MainThread] [INFO] [dataiku.doctor.multiframe] Set MF index len 20009 [2020-10-19 12:49:48,028] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:RemapValueToOutput [2020-10-19 12:49:48,036] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput [2020-10-19 12:49:48,036] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {u'length': 749.0573741816182} [2020-10-19 12:49:48,037] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:RescalingProcessor2 (length) [2020-10-19 12:49:48,038] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] Rescale length (avg=749.057374182 std=576.884411349 shift=749.057374182 inv_scale=0.00173344950969) [2020-10-19 12:49:48,060] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] Rescaled length (avg=6.64668979558e-17 std=1.0) nulls=0 [2020-10-19 12:49:48,060] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(num_flagonly) [2020-10-19 12:49:48,060] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:MultipleImputeMissingFromInput [2020-10-19 12:49:48,060] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] MIMIFI: Imputing with map {} [2020-10-19 12:49:48,060] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:FlushDFBuilder(cat_flagpresence) [2020-10-19 12:49:48,060] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] FIT/PROCESS WITH Step:TextHashingVectorizerWithSVDProcessor (text hs=200000 sl=50000 sc=100) [2020-10-19 12:49:48,101] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] Processing serie 0 60 background rehears record sympathi devil cl... 1 tess storm countri mari pickford vehicl intend... 2 man hilari comedi stupid made realiz pile stan... 3 ban video nasti uk unhing natur gain quit bit ... 4 given movi put togeth less year might explain ... 5 laugh whatsoev yes watch entir train wreck t l... 6 like earli masterpiec eleph man lynch prove de... 7 fine martino outing spirit enjoy giallo fine p... 8 without doubt worst movi ever seen funni inter... 9 nice pleasant funni earth shatter good job sho... 10 justif happen movi term distributor secondari ... 11 been never curious film ever play sport wonder... 12 veri disappoint version lorna doon mani miss c... 13 movi start scroll text take near five minut gi... 14 say bsg fan t exact sure think show saw big sc... 15 ok rent clown like chainsaw massacr esquir fil... 16 lucki enough seen whim film festiv smack hard ... 17 teenag go trip camper van mani clich guarante ... 18 murder number one movi expect made tv t consid... 19 one drama bad almost hit point veri funni scri... 20 high expect follow beauti laundrett bend like ... 21 bore movi bore town 50 anyon think classic pro... 22 one disturb tragic period american histori beg... 23 t much say movi wonder tour de forc peter sell... 24 one movi t requir brain think veri funni time ... 25 think micheal ironsid act career must star sor... 26 wendi wu homecom warrior veri good strong plot... 27 got lure titl expect insight intrigu journey a... 28 movi use cinematographi fantast depth camera c... 29 comment eliz7212 1 hit proverbi nail head turk... ... 19979 anyon visit drive in 1950s 60s 70s must seen f... 19980 movi seen far best one made primarili u naval ... 19981 realli way beat around bush say ladi death mot... 19982 ripoff dozen better film particular steven mar... 19983 maxim manipul anabel sim betsi drake set trap ... 19984 wow t believ first one post comment great movi... 19985 troubl suspend disbelief t consid woman alread... 19986 hrm think line old movi poster br br dumb movi... 19987 wonder movi never saw light day time releas aw... 19988 recent shop saw box set americian gothic thoug... 19989 just butcher wonder stori edwin torr movi t fo... 19990 found good movi pass time chanc histor valu po... 19991 ella excel franchot unavoid top play similar p... 19992 far wors aw laurel hardi cartoon 60s terribl l... 19993 found veri veri difficulti watch initi 5 minut... 19994 movi anoth christian propaganda film line omeg... 19995 disgrac check hope undiscov jame garner gem st... 19996 sad imdb allow rate judg lower 1 shame ghast m... 19997 paul reiser step away standup comedi spotlight... 19998 like realli pleas t think idiot admit enjoy fi... 19999 alway knew ann desalvo great charact actor now... 20000 excel movi canadian grew rural lifestyl much f... 20001 one movi made feel strong need make movi gener... 20002 like lot fact see plan just may love ll echo r... 20003 poor imag profession polic offic display telev... 20004 feel bless known worst steven seagal movi ever... 20005 love cult 70 sci fi way like movi repo man buc... 20006 amaz combin love psych two young peopl present... 20007 avoid one unless want watch expens bad made mo... 20008 gospel lou major disappoint receiv e mail thea... Name: text, Length: 20009, dtype: object [2020-10-19 12:49:54,650] [1955/MainThread] [DEBUG] [dku.ml.preprocessing] Got matrix: (20009, 200000) [2020-10-19 12:50:15,890] [1955/MainThread] [INFO] [dataiku.doctor.utils.listener] END - Preprocessing train set Traceback (most recent call last): File "/home/dataiku/dataiku-dss-8.0.2/python/dataiku/doctor/server.py", line 46, in serve ret = api_command(arg) File "/home/dataiku/dataiku-dss-8.0.2/python/dataiku/doctor/dkuapi.py", line 45, in aux return api(**kwargs) File "/home/dataiku/dataiku-dss-8.0.2/python/dataiku/doctor/commands.py", line 313, in train_prediction_models_nosave transformed_train = pipeline.fit_and_process(train_df) File "/home/dataiku/dataiku-dss-8.0.2/python/dataiku/doctor/preprocessing/dataframe_preprocessing.py", line 1942, in fit_and_process new_mf = step.fit_and_process(input_df, cur_mf, result, self.generated_features_mapping) File "/home/dataiku/dataiku-dss-8.0.2/python/dataiku/doctor/preprocessing/dataframe_preprocessing.py", line 1292, in fit_and_process self.svd_res["svd"].fit(matrix) File "/home/dataiku/dataiku-dss-8.0.2/python.packages/sklearn/decomposition/truncated_svd.py", line 141, in fit self.fit_transform(X) File "/home/dataiku/dataiku-dss-8.0.2/python.packages/sklearn/decomposition/truncated_svd.py", line 177, in fit_transform random_state=random_state) File "/home/dataiku/dataiku-dss-8.0.2/python.packages/sklearn/utils/extmath.py", line 365, in randomized_svd power_iteration_normalizer, random_state) File "/home/dataiku/dataiku-dss-8.0.2/python.packages/sklearn/utils/extmath.py", line 250, in randomized_range_finder Q, _ = linalg.lu(safe_sparse_dot(A.T, Q), permute_l=True) File "/home/dataiku/dataiku-dss-8.0.2/python.packages/scipy/linalg/decomp_lu.py", line 211, in lu a1 = asarray_chkfinite(a) File "/home/dataiku/dataiku-dss-8.0.2/python.packages/numpy/lib/function_base.py", line 461, in asarray_chkfinite "array must not contain infs or NaNs") ValueError: array must not contain infs or NaNs [2020/10/19-12:50:15.897] [MRT-169] [INFO] [dku.block.link.interaction] - Check result for nullity exceptionIfNull=true result=null [2020/10/19-12:50:16.028] [KNL-python-single-command-kernel-monitor-176] [INFO] [dku.kernels] - Process done with code 0 [2020/10/19-12:50:16.028] [KNL-python-single-command-kernel-monitor-176] [INFO] [dip.tickets] - Destroying API ticket for analysis-ml-DKU_TUTORIAL_NLP_VISUAL-D0qL1RE on behalf of admin [2020/10/19-12:50:16.033] [KNL-python-single-command-kernel-monitor-176] [WARN] [dku.resource] - stat file for pid 1955 does not exist. Process died? [2020/10/19-12:50:16.034] [KNL-python-single-command-kernel-monitor-176] [INFO] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"ANALYSIS_ML_TRAIN","authIdentifier":"admin","projectKey":"DKU_TUTORIAL_NLP_VISUAL","analysisId":"9pSKHeiU","mlTaskId":"Jq2My3Ar","sessionId":"s1"},"type":"LOCAL_PROCESS","id":"M8qpvT3aItV6PMmG","startTime":1603111777765,"localProcess":{"pid":1955,"commandName":"/home/dataiku/dss/bin/python","cpuUserTimeMS":990,"cpuSystemTimeMS":14110,"cpuChildrenUserTimeMS":0,"cpuChildrenSystemTimeMS":20,"cpuTotalMS":15120,"cpuCurrent":0.4316546762589928,"vmSizeMB":1291,"vmRSSMB":496,"vmHWMMB":671,"vmRSSAnonMB":476,"vmDataMB":622,"vmSizePeakMB":1426,"vmRSSPeakMB":671,"vmRSSTotalMBS":11540,"majorFaults":79,"childrenMajorFaults":0}} [2020/10/19-12:50:16.034] [MRT-169] [INFO] [dku.kernels] - Getting kernel tail [2020/10/19-12:50:16.035] [MRT-169] [INFO] [dku.kernels] - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : : array must not contain infs or NaNs from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@4e260275 process=null pid=?? retcode=0 [2020/10/19-12:50:16.035] [MRT-169] [WARN] [dku.analysis.ml.python] - Training failed com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : : array must not contain infs or NaNs at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190) at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:208) at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:74) at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:143) [2020/10/19-12:50:16.037] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.ml.python] T-Jq2My3Ar - Processing thread joined ... [2020/10/19-12:50:16.038] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.ml.python] T-Jq2My3Ar - Joining processing thread ... [2020/10/19-12:50:16.038] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.ml.python] T-Jq2My3Ar - Processing thread joined ... [2020/10/19-12:50:16.039] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.prediction] T-Jq2My3Ar - Train done [2020/10/19-12:50:16.039] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.prediction] T-Jq2My3Ar - Train done [2020/10/19-12:50:16.049] [FT-TrainWorkThread-8wQayRe8-168] [INFO] [dku.analysis.prediction] T-Jq2My3Ar - Publishing mltask-train-done reflected event