Initial job has not accepted any resources (Recipes on Kubernetes)

Options
bashayr
bashayr Registered Posts: 3 ✭✭✭✭

I use DSS to push execution of visual recipes to containerized execution on Kubernetes cluster(k8s), using Spark as the execution engine. I pushed two images to registry: dku-exec-base and dku-spark-base However, when I run the recipe it takes forever running (creating and deleting pods in k8s), I found this line in Job logs:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

here pod logs in k8s:

++ id -u+ myuid=998++ id -g+ mygid=996+ set +e++ getent passwd 998+ uidentry=dataiku:x:998:996::/home/dataiku:/bin/bash+ set -e+ '[' -z dataiku:x:998:996::/home/dataiku:/bin/bash ']'+ SPARK_CLASSPATH=':/opt/spark/jars/*'+ env+ grep SPARK_JAVA_OPT_+ sort -t_ -k4 -n+ sed 's/[^=]*=\(.*\)/\1/g'+ readarray -t SPARK_EXECUTOR_JAVA_OPTS+ '[' -n '' ']'+ '[' '' == 2 ']'+ '[' '' == 3 ']'+ '[' -n '' ']'+ '[' -z ']'+ case "$1" in+ shift 1+ CMD=(${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH" org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $SPARK_EXECUTOR_POD_IP)+ /bin/java -Dspark.driver.port=37802 -Xms1g -Xmx1g -cp ':/opt/spark/jars/*:' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.10.0.6:37802 --executor-id 105 --cores 1 --app-id spark-application-1643539966991 --hostname 10.244.0.102Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties22/01/30 11:04:56 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 14@dsspredictscorewebnewcustomers-b2jyc7ki-exec-10522/01/30 11:04:56 INFO SignalUtils: Registered signal handler for TERM22/01/30 11:04:56 INFO SignalUtils: Registered signal handler for HUP22/01/30 11:04:56 INFO SignalUtils: Registered signal handler for INT22/01/30 11:04:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable22/01/30 11:04:56 INFO SecurityManager: Changing view acls to: dataiku22/01/30 11:04:56 INFO SecurityManager: Changing modify acls to: dataiku22/01/30 11:04:56 INFO SecurityManager: Changing view acls groups to:22/01/30 11:04:56 INFO SecurityManager: Changing modify acls groups to:22/01/30 11:04:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(dataiku); groups with view permissions: Set(); users  with modify permissions: Set(dataiku); groups with modify permissions: Set()Exception in thread "main" java.lang.reflect.UndeclaredThrowableExceptionat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1780)at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:283)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:272)at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$3(CoarseGrainedExecutorBackend.scala:303)at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)at scala.collection.immutable.Range.foreach(Range.scala:158)at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$1(CoarseGrainedExecutorBackend.scala:301)at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)... 4 moreCaused by: java.io.IOException: Failed to connect to /10.10.0.6:37802at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:750)Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: /10.10.0.6:37802Caused by: java.net.NoRouteToHostException: No route to hostat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)at java.lang.Thread.run(Thread.java:750)

connectivity between DSS and the cluster work, but what does this mean? What am I missing?

(k8s on digitalocean)


Operating system used: centos os 7

Tagged:

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    Spark requires that the executors can connect to the driver, and almost all communication is in the direction executor -> driver. If 10.0.0.6 is the IP of the DSS machine, you need to ensure that the cluster nodes (and the pods) can access it. The present error means that your cluster setup is blocking this communication, so you need to add security groups or firewall rules depending on the cloud your cluster is running on.

  • bashayr
    bashayr Registered Posts: 3 ✭✭✭✭
    Options

    Hi @fchataigner2
    thanks for your reply

    this IP is not the IP of the DSS machine and I disabled the firewall, however I added into bin/env-site.sh

    export DKU_BACKEND_EXT_HOST=DSS_IP 

    after added this spark.driver.host to spark config it works!

Setup Info
    Tags
      Help me…