Using spark-submit with scala into kubernetes cluster

DrissiReda
DrissiReda Registered Posts: 57 ✭✭✭✭✭
edited July 16 in Using Dataiku

I'm using dataiku version 9.0 in my kubernetes cluster, I managed to do a spark-submit of a jar with a shell. But I can't figure out how to submit a scala job to my kubernetes cluster: This is my spark configuration:

Screenshot from 2021-03-05 11-18-48.png

These are the errors from the recipe:

[10:17:33] [INFO] [dku.utils]  - Exception in thread "main" java.lang.NoClassDefFoundError: scala/App$class
[10:17:33] [INFO] [dku.utils]  -    at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint$.<init>(SparkScalaRecipeEntryPoint.scala:15)
[10:17:33] [INFO] [dku.utils]  -    at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint$.<clinit>(SparkScalaRecipeEntryPoint.scala)
[10:17:33] [INFO] [dku.utils]  -    at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint.main(SparkScalaRecipeEntryPoint.scala)
[10:17:33] [INFO] [dku.utils]  -    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[10:17:33] [INFO] [dku.utils]  -    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[10:17:33] [INFO] [dku.utils]  -    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[10:17:33] [INFO] [dku.utils]  -    at java.lang.reflect.Method.invoke(Method.java:498)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
[10:17:33] [INFO] [dku.utils]  -    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[10:17:33] [INFO] [dku.utils]  - Caused by: java.lang.ClassNotFoundException: scala.App$class
[10:17:33] [INFO] [dku.utils]  -    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
[10:17:33] [INFO] [dku.utils]  -    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[10:17:33] [INFO] [dku.utils]  -    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

And

com.dataiku.dip.exceptions.ProcessDiedException: The Scala process failed (exit code: 1). More info might be available in the logs.
    at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
    at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26)
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:71)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:218)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82)
    at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeRunner.run(SparkScalaRecipeRunner.java:43)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[10:17:33] [INFO] [dku.flow.activity] running compute_aa_NP - activity is finished
[10:17:33] [ERROR] [dku.flow.activity] running compute_aa_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The Scala process failed (exit code: 1). More info might be available in the logs.
    at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
    at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26)
    at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:71)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:218)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97)
    at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82)
    at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeRunner.run(SparkScalaRecipeRunner.java:43)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)

And this error is from the backend log from Maintenance tab:

[2021/03/15-10:17:01.625] [KNL-SCALA-0NUU3zmo-monitor-3467] [ERROR] [dku.kernels]  - KernelMonitorThread done:  Closing: com.dataiku.dip.cluster.ClusterDependentKernelHandle$1@79145fc6
[2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.kernels.cluster_dependent]  - scala kernel process died before start
[2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.kernels.cluster_dependent]  - Could not retrieve Spark training error details
[2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.recipes.interactioncontroller]  - Failed to compute recipe status
java.lang.Exception: scala kernel process died before start
    at com.dataiku.dip.cluster.ClusterDependentKernelHandle.waitForPort(ClusterDependentKernelHandle.java:155)
    at com.dataiku.dip.cluster.ClusterDependentKernelHandle.start(ClusterDependentKernelHandle.java:137)
    at com.dataiku.dip.cluster.ClusterDependentKernelsManager.newKernel(ClusterDependentKernelsManager.java:111)
    at com.dataiku.dip.cluster.ClusterDependentKernelsManager.acquireKernel(ClusterDependentKernelsManager.java:154)
    at com.dataiku.dip.recipes.code.scala.ScalaService.getKernel(ScalaService.java:67)
    at com.dataiku.dip.recipes.code.scala.ScalaService.checkSyntax(ScalaService.java:75)
    at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeStatusComputer.getFullStatus_NT(SparkScalaRecipeStatusComputer.java:61)
    at com.dataiku.dip.server.recipes.GenericRecipeInteractionController.getStatus(GenericRecipeInteractionController.java:158)
    at com.dataiku.dip.server.recipes.GenericRecipeInteractionController$$FastClassBySpringCGLIB$$637d4c8.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:701)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80)
    at com.dataiku.dip.server.controllers.CallTracingAspect.doCall(CallTracingAspect.java:78)
    at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621)
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610)
    at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
    at com.dataiku.dip.server.recipes.GenericRecipeInteractionController$$EnhancerBySpringCGLIB$$fcb4d19.getStatus(<generated>)
    at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:743)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:672)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933)
    at com.dataiku.dip.server.controllers.DKUDispatcherServlet.doDispatch(DKUDispatcherServlet.java:50)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder$NotAsyncServlet.service(ServletHolder.java:1411)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1651)
    at com.dataiku.dip.shaker.server.ResourceFilter.doFilter(ResourceFilter.java:33)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1630)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:567)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:501)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Kernel process return code is 1
    at com.dataiku.dip.kernels.DSSKernelBase$KernelMonitorThread.run(DSSKernelBase.java:139)

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    edited July 17

    Hi,

    the error

    java.lang.NoClassDefFoundError: scala/App$class

    is usual a telltale sign that the scala version is incorrect (ie that Spark is trying to run code compiled against scala 2.12 or 2.11 binaries with 2.10 binaries, or some similar mismatch). The scala version is tied to the spark version (up to spark 1.6 => scala2.10; spark 2 => scala2.11; spark 3 => scala 2.12).

    You need to keep the different spark bits in concordance with one another:

    - the spark home passed to dssadmin install-spark-integration. If that command was run and the setup didn't change since, the ./bin/env-spark.sh file in the DSS data dir should contain the version you expect

    - the spark used in the custom image

  • DrissiReda
    DrissiReda Registered Posts: 57 ✭✭✭✭✭

    The DKU_SPARK_VERSION variable is set to 3.0.2, the spark cluster on kubernetes and the custom image I'm using, run spark 3.0.0

Setup Info
    Tags
      Help me…