Example of spark broadcast
bkostya
Registered Posts: 4 ✭✭✭
Hello!
Please help to build a working example of using broadcast on dataiku scala spark. I'm getting NullPointerException. Running from recipe. The goal is to get dfA available per RDD or per partition.
DKU_SPARK_VERSION 2.3.0.2.6.5.106
val dfA = sqlContext.createDataFrame(Seq( (1L,501L), (2L,502L) )).toDF("uid","data") val dfB = sqlContext.createDataFrame(Seq( (10L,1001L), (20L,1002L), (30L,1003L), (40L,1004L) )).toDF("uid","data") val broadcastA = sparkContext.broadcast(dfA.collect) dfB .repartition(2) .rdd .foreach(rdd => { import sqlContext.implicits._ import scala.collection.JavaConverters._ val schema5 = new StructType() .add(StructField("uid", LongType, false)) .add(StructField("data", LongType, false)) val whoami = sqlContext.getClass.toString // OK, can access sqlContext val data5 = broadcastA.value.toList.asJava // OK // Fail here: NullPointerException val df5 = sqlContext.createDataFrame(data5,schema5)//.toDF("id","data") })