Using spark-submit with scala into kubernetes cluster
I'm using dataiku version 9.0 in my kubernetes cluster, I managed to do a spark-submit of a jar with a shell. But I can't figure out how to submit a scala job to my kubernetes cluster: This is my spark configuration:
These are the errors from the recipe:
[10:17:33] [INFO] [dku.utils] - Exception in thread "main" java.lang.NoClassDefFoundError: scala/App$class [10:17:33] [INFO] [dku.utils] - at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint$.<init>(SparkScalaRecipeEntryPoint.scala:15) [10:17:33] [INFO] [dku.utils] - at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint$.<clinit>(SparkScalaRecipeEntryPoint.scala) [10:17:33] [INFO] [dku.utils] - at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint.main(SparkScalaRecipeEntryPoint.scala) [10:17:33] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [10:17:33] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [10:17:33] [INFO] [dku.utils] - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [10:17:33] [INFO] [dku.utils] - at java.lang.reflect.Method.invoke(Method.java:498) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) [10:17:33] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [10:17:33] [INFO] [dku.utils] - Caused by: java.lang.ClassNotFoundException: scala.App$class [10:17:33] [INFO] [dku.utils] - at java.net.URLClassLoader.findClass(URLClassLoader.java:382) [10:17:33] [INFO] [dku.utils] - at java.lang.ClassLoader.loadClass(ClassLoader.java:418) [10:17:33] [INFO] [dku.utils] - at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
And
com.dataiku.dip.exceptions.ProcessDiedException: The Scala process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:71) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:218) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82) at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeRunner.run(SparkScalaRecipeRunner.java:43) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [10:17:33] [INFO] [dku.flow.activity] running compute_aa_NP - activity is finished [10:17:33] [ERROR] [dku.flow.activity] running compute_aa_NP - Activity failed com.dataiku.dip.exceptions.ProcessDiedException: The Scala process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:71) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:218) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97) at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82) at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeRunner.run(SparkScalaRecipeRunner.java:43) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
And this error is from the backend log from Maintenance tab:
[2021/03/15-10:17:01.625] [KNL-SCALA-0NUU3zmo-monitor-3467] [ERROR] [dku.kernels] - KernelMonitorThread done: Closing: com.dataiku.dip.cluster.ClusterDependentKernelHandle$1@79145fc6 [2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.kernels.cluster_dependent] - scala kernel process died before start [2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.kernels.cluster_dependent] - Could not retrieve Spark training error details [2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.recipes.interactioncontroller] - Failed to compute recipe status java.lang.Exception: scala kernel process died before start at com.dataiku.dip.cluster.ClusterDependentKernelHandle.waitForPort(ClusterDependentKernelHandle.java:155) at com.dataiku.dip.cluster.ClusterDependentKernelHandle.start(ClusterDependentKernelHandle.java:137) at com.dataiku.dip.cluster.ClusterDependentKernelsManager.newKernel(ClusterDependentKernelsManager.java:111) at com.dataiku.dip.cluster.ClusterDependentKernelsManager.acquireKernel(ClusterDependentKernelsManager.java:154) at com.dataiku.dip.recipes.code.scala.ScalaService.getKernel(ScalaService.java:67) at com.dataiku.dip.recipes.code.scala.ScalaService.checkSyntax(ScalaService.java:75) at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeStatusComputer.getFullStatus_NT(SparkScalaRecipeStatusComputer.java:61) at com.dataiku.dip.server.recipes.GenericRecipeInteractionController.getStatus(GenericRecipeInteractionController.java:158) at com.dataiku.dip.server.recipes.GenericRecipeInteractionController$$FastClassBySpringCGLIB$$637d4c8.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:701) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80) at com.dataiku.dip.server.controllers.CallTracingAspect.doCall(CallTracingAspect.java:78) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633) at com.dataiku.dip.server.recipes.GenericRecipeInteractionController$$EnhancerBySpringCGLIB$$fcb4d19.getStatus(<generated>) at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:743) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:672) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933) at com.dataiku.dip.server.controllers.DKUDispatcherServlet.doDispatch(DKUDispatcherServlet.java:50) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder$NotAsyncServlet.service(ServletHolder.java:1411) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1651) at com.dataiku.dip.shaker.server.ResourceFilter.doFilter(ResourceFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1630) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:567) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:501) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Kernel process return code is 1 at com.dataiku.dip.kernels.DSSKernelBase$KernelMonitorThread.run(DSSKernelBase.java:139)
Answers
-
Hi,
the error
java.lang.NoClassDefFoundError: scala/App$class
is usual a telltale sign that the scala version is incorrect (ie that Spark is trying to run code compiled against scala 2.12 or 2.11 binaries with 2.10 binaries, or some similar mismatch). The scala version is tied to the spark version (up to spark 1.6 => scala2.10; spark 2 => scala2.11; spark 3 => scala 2.12).
You need to keep the different spark bits in concordance with one another:
- the spark home passed to dssadmin install-spark-integration. If that command was run and the setup didn't change since, the ./bin/env-spark.sh file in the DSS data dir should contain the version you expect
- the spark used in the custom image
-
The DKU_SPARK_VERSION variable is set to 3.0.2, the spark cluster on kubernetes and the custom image I'm using, run spark 3.0.0