Initial job has not accepted any resources (Recipes on Kubernetes)

bashayr Registered Posts: 3 ✭✭✭✭

I use DSS to push execution of visual recipes to containerized execution on Kubernetes cluster(k8s), using Spark as the execution engine. I pushed two images to registry: dku-exec-base and dku-spark-base However, when I run the recipe it takes forever running (creating and deleting pods in k8s), I found this line in Job logs:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

here pod logs in k8s:

++ id -u+ myuid=998++ id -g+ mygid=996+ set +e++ getent passwd 998+ uidentry=dataiku:x:998:996::/home/dataiku:/bin/bash+ set -e+ '[' -z dataiku:x:998:996::/home/dataiku:/bin/bash ']'+ SPARK_CLASSPATH=':/opt/spark/jars/*'+ env+ grep SPARK_JAVA_OPT_+ sort -t_ -k4 -n+ sed 's/[^=]*=\(.*\)/\1/g'+ readarray -t SPARK_EXECUTOR_JAVA_OPTS+ '[' -n '' ']'+ '[' '' == 2 ']'+ '[' '' == 3 ']'+ '[' -n '' ']'+ '[' -z ']'+ case "$1" in+ shift 1+ CMD=(${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH" org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $SPARK_EXECUTOR_POD_IP)+ /bin/java -Dspark.driver.port=37802 -Xms1g -Xmx1g -cp ':/opt/spark/jars/*:' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ --executor-id 105 --cores 1 --app-id spark-application-1643539966991 --hostname Spark's default log4j profile: org/apache/spark/log4j-defaults.properties22/01/30 11:04:56 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 14@dsspredictscorewebnewcustomers-b2jyc7ki-exec-10522/01/30 11:04:56 INFO SignalUtils: Registered signal handler for TERM22/01/30 11:04:56 INFO SignalUtils: Registered signal handler for HUP22/01/30 11:04:56 INFO SignalUtils: Registered signal handler for INT22/01/30 11:04:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable22/01/30 11:04:56 INFO SecurityManager: Changing view acls to: dataiku22/01/30 11:04:56 INFO SecurityManager: Changing modify acls to: dataiku22/01/30 11:04:56 INFO SecurityManager: Changing view acls groups to:22/01/30 11:04:56 INFO SecurityManager: Changing modify acls groups to:22/01/30 11:04:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(dataiku); groups with view permissions: Set(); users  with modify permissions: Set(dataiku); groups with modify permissions: Set()Exception in thread "main" java.lang.reflect.UndeclaredThrowableExceptionat org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:283)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:272)at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$3(CoarseGrainedExecutorBackend.scala:303)at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$ scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)at scala.collection.immutable.Range.foreach(Range.scala:158)at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$1(CoarseGrainedExecutorBackend.scala:301)at org.apache.spark.deploy.SparkHadoopUtil$$anon$ org.apache.spark.deploy.SparkHadoopUtil$$anon$ Method)at 4 moreCaused by: Failed to connect to / org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)at org.apache.spark.rpc.netty.Outbox$$anon$ org.apache.spark.rpc.netty.Outbox$$anon$ java.util.concurrent.ThreadPoolExecutor.runWorker( java.util.concurrent.ThreadPoolExecutor$ by:$AnnotatedNoRouteToHostException: No route to host: / by: No route to hostat Method)at$AbstractNioUnsafe.finishConnect( io.netty.util.concurrent.SingleThreadEventExecutor$ io.netty.util.internal.ThreadExecutorMap$

connectivity between DSS and the cluster work, but what does this mean? What am I missing?

(k8s on digitalocean)

Operating system used: centos os 7



  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker


    Spark requires that the executors can connect to the driver, and almost all communication is in the direction executor -> driver. If is the IP of the DSS machine, you need to ensure that the cluster nodes (and the pods) can access it. The present error means that your cluster setup is blocking this communication, so you need to add security groups or firewall rules depending on the cloud your cluster is running on.

  • bashayr
    bashayr Registered Posts: 3 ✭✭✭✭

    Hi @fchataigner2
    thanks for your reply

    this IP is not the IP of the DSS machine and I disabled the firewall, however I added into bin/


    after added this to spark config it works!

Setup Info
      Help me…