Example of spark broadcast
bkostya
Registered Posts: 4 ✭✭✭
Hello!
Please help to build a working example of using broadcast on dataiku scala spark. I'm getting NullPointerException. Running from recipe. The goal is to get dfA available per RDD or per partition.
DKU_SPARK_VERSION 2.3.0.2.6.5.106
val dfA = sqlContext.createDataFrame(Seq(
(1L,501L),
(2L,502L)
)).toDF("uid","data")
val dfB = sqlContext.createDataFrame(Seq(
(10L,1001L),
(20L,1002L),
(30L,1003L),
(40L,1004L)
)).toDF("uid","data")
val broadcastA = sparkContext.broadcast(dfA.collect)
dfB
.repartition(2)
.rdd
.foreach(rdd => {
import sqlContext.implicits._
import scala.collection.JavaConverters._
val schema5 = new StructType()
.add(StructField("uid", LongType, false))
.add(StructField("data", LongType, false))
val whoami = sqlContext.getClass.toString // OK, can access sqlContext
val data5 = broadcastA.value.toList.asJava // OK
// Fail here: NullPointerException
val df5 = sqlContext.createDataFrame(data5,schema5)//.toDF("id","data")
})