No rows in train dataframe after target remap. Target empty? Type mismatch?

UserBird · March 2017

Hi,

I get this error message when training a classification model with MLLib


[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  - java.lang.IllegalArgumentException: No rows in train dataframe after target remap. Target empty? Type mismatch?
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply$mcV$sp(MLLibPredictionJob.scala:216)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.ProgressListener.push(ProgressListener.scala:46)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$class.prepare(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.prepare(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob$delayedInit$body.apply(MLLibPredictionDoctorJob.scala:72)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.SuicidalApp$$anonfun$delayedInit$1.apply$mcV$sp(package.scala:402)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.collection.immutable.List.foreach(List.scala:318)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.App$class.main(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.main(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob.main(MLLibPredictionDoctorJob.scala)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at java.lang.reflect.Method.invoke(Method.java:497)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:710)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

What is the problem ?

Clément_Stenac · March 2017

Hi,

Assuming that your target column is indeed properly filled, the most probable cause is a "boolean normalization mismatch".

If your target column has "boolean" storage type (beware, it's storage type, not meaning, see : https://doc.dataiku.com/dss/4.0/schemas/), then for mllib to work properly, it MUST contain "true" and "false" as values.

In other words, for a mllib target, if the storage type is boolean, values like "0" or "1" are not supported.

When reading CSV files, DSS supports more than just "true" and "false", it supports things like 0, 1, yes, no, ... But mllib doesn't support this. You can force DSS to convert all "non-real-boolean" values to "real-boolean" values by checking the "Normalize booleans" checkbox in the dataset format settings.

No rows in train dataframe after target remap. Target empty? Type mismatch?

Answers

Categories

Setup Info

Tags