Traning MLlib algorithm with yarn-cluster

UserBird · ‎01-20-2017

Hi,

I try to build an Mllib model with yarn-cluster set as master, but the execution fails for both Random Forest and Logistic Regression. Input data is the iris dataset on HDFS.

yarn-cluster submission works for PySpark script, and master=local model building also works.

I've only set the master, executor-memory and executor-instances in the Spark config.

The relevant log part:


Exception in thread "main" java.lang.IllegalArgumentException: requirement    failed	at scala.Predef$.require(Predef.scala:221)	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$8$$anonfun$apply$5.apply(Client.scala:501)	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$8$$anonfun$apply$5.apply(Client.scala:499)	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

Clément_Stenac · ‎01-20-2017

Hi,

DSS does not support the yarn-cluster mode. This is due to the fact that when you train models, DSS needs to write results into the DSS datadir, which is much more difficult in yarn-cluster mode.

To run a DSS Spark job on your YARN cluster, use the yarn-client mode

Traning MLlib algorithm with yarn-cluster

Traning MLlib algorithm with yarn-cluster

Labels

Advanced ML

Machine Learning

MLLib

Spark

Sign up to take part

Traning MLlib algorithm with yarn-cluster

Traning MLlib algorithm with yarn-cluster

Labels

Advanced ML

Machine Learning

MLLib

Spark