KeyError: 'weight' in ML scoring
mattmagic
Registered Posts: 12 ✭✭✭✭
Hey,
I recently upgraded dataiku to 4.2.0 and now the scoring of my ML models has the following issue:
[07:52:46] [INFO] [dku.utils] - 2018-10-30 07:52:46,007 INFO Will do preparation, output schema: {u'userModified': False, u'columns': [{u'timestampNoTzAsDate': False, u'type': u'string', u'name': u'Domain', u'maxLength': -1}, {u'timestampNoTzAsDate': False, u'type': u'string', u'name': u'DescriptionProcessed', u'maxLength': -1}, {u'timestampNoTzAsDate': False, u'type': u'double', u'name': u'AgTech & New Food', u'maxLength': -1}]}
[07:52:46] [INFO] [dku.utils] - Traceback (most recent call last):
[07:52:46] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
[07:52:46] [INFO] [dku.utils] - "__main__", fname, loader, pkg_name)
[07:52:46] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
[07:52:46] [INFO] [dku.utils] - exec code in run_globals
[07:52:46] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/prediction/reg_evaluation_recipe.py", line 302, in <module>
[07:52:46] [INFO] [dku.utils] - dkujson.load_from_filepath(sys.argv[8]))
[07:52:46] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/prediction/reg_evaluation_recipe.py", line 45, in main
[07:52:46] [INFO] [dku.utils] - pipeline = preprocessing_handler.build_preprocessing_pipeline(with_target=True)
[07:52:46] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/preprocessing_handler.py", line 170, in build_preprocessing_pipeline
[07:52:46] [INFO] [dku.utils] - pipeline = PreprocessingPipeline(steps=list(self.preprocessing_steps(*args, **kwargs)))
[07:52:46] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/preprocessing_handler.py", line 728, in preprocessing_steps
[07:52:46] [INFO] [dku.utils] - if with_target and self.sample_weight_variable is not None:
[07:52:46] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/preprocessing_handler.py", line 711, in sample_weight_variable
[07:52:46] [INFO] [dku.utils] - return self.core_params["weight"].get("sampleWeightVariable", None)
[07:52:46] [INFO] [dku.utils] - KeyError: 'weight'
[07:52:46] [INFO] [dku.flow.activity] - Run thread failed for activity evaluate_on_CB_descriptions_AgTechFood_NP
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:40)
at com.dataiku.dip.analysis.ml.prediction.flow.EvaluationRecipeRunner.run(EvaluationRecipeRunner.java:174)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352)
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - activity is finished
[07:52:46] [ERROR] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:40)
at com.dataiku.dip.analysis.ml.prediction.flow.EvaluationRecipeRunner.run(EvaluationRecipeRunner.java:174)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352)
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Executing default post-activity lifecycle hook
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Removing samples for CLUSTERREPORTCLUSTERINGNEW.CB_desc_AgTechFood_m
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Removing samples for CLUSTERREPORTCLUSTERINGNEW.CB_desc_AgTechFood_s
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Done post-activity tasks
Does anyone know what the exact issue is?
Cheers,
Matthew
Tagged:
Answers
-
Hi Matt,
As Dataiku 4.2 is a major release, models trained with prior versions of DSS should be retrained when upgrading to 4.2.
Please find more on: https://doc.dataiku.com/dss/latest/release_notes/4.2.html#limitations-and-warnings
Retraining your deployed model in the flow should fix your issue
Hope it helps,
Alex