Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Launching visual recipes with Spark yields error : local class incompatible

DrissiReda
Level 3
Launching visual recipes with Spark yields error : local class incompatible

Trying to use the spark engine, configured to a spark cluster in kubernetes.

Using scala or spark-submit works well.

Some visual recipes like "Analyze" work well too.

Other visual recipes like "GroupBy" or "Pivot" give me the same error. Does anybody know what it means? or what it is caused by? I'm using dataiku 9.0 and spark 3.0.0.

Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 8, 10.244.5.167, executor 2): java.io.InvalidClassException: org.apache.spark.sql.execution.exchange.ShuffleExchangeExec; local class incompatible: stream classdesc serialVersionUID = 2362153657855391074, local class serialVersionUID = 8799643067332311612 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:488) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2235) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace:, caused by: InvalidClassException: org.apache.spark.sql.execution.exchange.ShuffleExchangeExec; local class incompatible: stream classdesc serialVersionUID = 2362153657855391074, local class serialVersionUID = 8799643067332311612

0 Kudos
2 Replies
fchataigner2
Dataiker
Dataiker

Hi,

the error implies that the jars on the images used by the kubernetes pods and the jars in the spark-submit called by DSS are different. You should make sure the setup is consistent:

- rerun dssadmin install-spark-integration ...

- then rebuild the spark image with dssadmin build-base-image --type spark ...

- and push the base images to the cloud repository with the "push base images" button on the Administration > Settings > Spark tab.

If the problem persists, you should open a support ticket with a full diagnostic of the failed job.

0 Kudos
DrissiReda
Level 3
Author

Thank you, I used my own spark image and it worked properly for a while. I'll rebuild the image and tell you how that goes.

0 Kudos
A banner prompting to get Dataiku DSS