Using spark-submit with scala into kubernetes cluster

DrissiReda
Level 4
Using spark-submit with scala into kubernetes cluster

I'm using dataiku version 9.0 in my kubernetes cluster, I managed to do a spark-submit of a jar with a shell. But I can't figure out how to submit a scala job to my kubernetes cluster: This is my spark configuration:

Screenshot from 2021-03-05 11-18-48.png

 

 

 

 

 

 

 

 

 

 

 

 

These are the errors from the recipe:

[10:17:33] [INFO] [dku.utils]  - Exception in thread "main" java.lang.NoClassDefFoundError: scala/App$class
[10:17:33] [INFO] [dku.utils]  - 	at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint$.<init>(SparkScalaRecipeEntryPoint.scala:15)
[10:17:33] [INFO] [dku.utils]  - 	at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint$.<clinit>(SparkScalaRecipeEntryPoint.scala)
[10:17:33] [INFO] [dku.utils]  - 	at com.dataiku.dip.spark.recipe.SparkScalaRecipeEntryPoint.main(SparkScalaRecipeEntryPoint.scala)
[10:17:33] [INFO] [dku.utils]  - 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[10:17:33] [INFO] [dku.utils]  - 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[10:17:33] [INFO] [dku.utils]  - 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[10:17:33] [INFO] [dku.utils]  - 	at java.lang.reflect.Method.invoke(Method.java:498)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
[10:17:33] [INFO] [dku.utils]  - 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[10:17:33] [INFO] [dku.utils]  - Caused by: java.lang.ClassNotFoundException: scala.App$class
[10:17:33] [INFO] [dku.utils]  - 	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
[10:17:33] [INFO] [dku.utils]  - 	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[10:17:33] [INFO] [dku.utils]  - 	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

And

com.dataiku.dip.exceptions.ProcessDiedException: The Scala process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
	at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:71)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:218)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82)
	at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeRunner.run(SparkScalaRecipeRunner.java:43)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[10:17:33] [INFO] [dku.flow.activity] running compute_aa_NP - activity is finished
[10:17:33] [ERROR] [dku.flow.activity] running compute_aa_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The Scala process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
	at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:71)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runUsingSparkSubmit(AbstractSparkBasedRecipeRunner.java:218)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.doRunSpark(AbstractSparkBasedRecipeRunner.java:128)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:97)
	at com.dataiku.dip.dataflow.exec.AbstractSparkBasedRecipeRunner.runSpark(AbstractSparkBasedRecipeRunner.java:82)
	at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeRunner.run(SparkScalaRecipeRunner.java:43)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)

And this error is from the backend log from Maintenance tab:

[2021/03/15-10:17:01.625] [KNL-SCALA-0NUU3zmo-monitor-3467] [ERROR] [dku.kernels]  - KernelMonitorThread done:  Closing: com.dataiku.dip.cluster.ClusterDependentKernelHandle$1@79145fc6
[2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.kernels.cluster_dependent]  - scala kernel process died before start
[2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.kernels.cluster_dependent]  - Could not retrieve Spark training error details
[2021/03/15-10:17:01.714] [qtp771105389-3456] [ERROR] [dku.recipes.interactioncontroller]  - Failed to compute recipe status
java.lang.Exception: scala kernel process died before start
	at com.dataiku.dip.cluster.ClusterDependentKernelHandle.waitForPort(ClusterDependentKernelHandle.java:155)
	at com.dataiku.dip.cluster.ClusterDependentKernelHandle.start(ClusterDependentKernelHandle.java:137)
	at com.dataiku.dip.cluster.ClusterDependentKernelsManager.newKernel(ClusterDependentKernelsManager.java:111)
	at com.dataiku.dip.cluster.ClusterDependentKernelsManager.acquireKernel(ClusterDependentKernelsManager.java:154)
	at com.dataiku.dip.recipes.code.scala.ScalaService.getKernel(ScalaService.java:67)
	at com.dataiku.dip.recipes.code.scala.ScalaService.checkSyntax(ScalaService.java:75)
	at com.dataiku.dip.recipes.code.scala.SparkScalaRecipeStatusComputer.getFullStatus_NT(SparkScalaRecipeStatusComputer.java:61)
	at com.dataiku.dip.server.recipes.GenericRecipeInteractionController.getStatus(GenericRecipeInteractionController.java:158)
	at com.dataiku.dip.server.recipes.GenericRecipeInteractionController$$FastClassBySpringCGLIB$$637d4c8.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:701)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80)
	at com.dataiku.dip.server.controllers.CallTracingAspect.doCall(CallTracingAspect.java:78)
	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610)
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
	at com.dataiku.dip.server.recipes.GenericRecipeInteractionController$$EnhancerBySpringCGLIB$$fcb4d19.getStatus(<generated>)
	at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:743)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:672)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933)
	at com.dataiku.dip.server.controllers.DKUDispatcherServlet.doDispatch(DKUDispatcherServlet.java:50)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder$NotAsyncServlet.service(ServletHolder.java:1411)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1651)
	at com.dataiku.dip.shaker.server.ResourceFilter.doFilter(ResourceFilter.java:33)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1630)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:567)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:501)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Kernel process return code is 1
	at com.dataiku.dip.kernels.DSSKernelBase$KernelMonitorThread.run(DSSKernelBase.java:139)
0 Kudos
2 Replies
fchataigner2
Dataiker

Hi,

the error

java.lang.NoClassDefFoundError: scala/App$class

is usual a telltale sign that the scala version is incorrect (ie that Spark is trying to run code compiled against scala 2.12 or 2.11 binaries with 2.10 binaries, or some similar mismatch). The scala version is tied to the spark version (up to spark 1.6 => scala2.10; spark 2 => scala2.11; spark 3 => scala 2.12).

You need to keep the different spark bits in concordance with one another:

- the spark home passed to dssadmin install-spark-integration. If that command was run and the setup didn't change since, the ./bin/env-spark.sh file in the DSS data dir should contain the version you expect

- the spark used in the custom image

0 Kudos
DrissiReda
Level 4
Author

The DKU_SPARK_VERSION variable is set to  3.0.2, the spark cluster on kubernetes and the custom image I'm using, run spark 3.0.0

0 Kudos