No rows in train dataframe after target remap. Target empty? Type mismatch?

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
edited July 16 in Using Dataiku

Hi,

I get this error message when training a classification model with MLLib


[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - java.lang.IllegalArgumentException: No rows in train dataframe after target remap. Target empty? Type mismatch?
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply$mcV$sp(MLLibPredictionJob.scala:216)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.ProgressListener.push(ProgressListener.scala:46)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$class.prepare(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.prepare(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob$delayedInit$body.apply(MLLibPredictionDoctorJob.scala:72)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.SuicidalApp$$anonfun$delayedInit$1.apply$mcV$sp(package.scala:402)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.collection.immutable.List.foreach(List.scala:318)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.App$class.main(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.main(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob.main(MLLibPredictionDoctorJob.scala)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at java.lang.reflect.Method.invoke(Method.java:497)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:710)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

What is the problem ?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    Assuming that your target column is indeed properly filled, the most probable cause is a "boolean normalization mismatch".

    If your target column has "boolean" storage type (beware, it's storage type, not meaning, see : https://doc.dataiku.com/dss/4.0/schemas/), then for mllib to work properly, it MUST contain "true" and "false" as values.

    In other words, for a mllib target, if the storage type is boolean, values like "0" or "1" are not supported.

    When reading CSV files, DSS supports more than just "true" and "false", it supports things like 0, 1, yes, no, ... But mllib doesn't support this. You can force DSS to convert all "non-real-boolean" values to "real-boolean" values by checking the "Normalize booleans" checkbox in the dataset format settings.
Setup Info
    Tags
      Help me…