Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello!
Please help to build a working example of using broadcast on dataiku scala spark. I'm getting NullPointerException. Running from recipe. The goal is to get dfA available per RDD or per partition.
DKU_SPARK_VERSION 2.3.0.2.6.5.106
val dfA = sqlContext.createDataFrame(Seq(
(1L,501L),
(2L,502L)
)).toDF("uid","data")
val dfB = sqlContext.createDataFrame(Seq(
(10L,1001L),
(20L,1002L),
(30L,1003L),
(40L,1004L)
)).toDF("uid","data")
val broadcastA = sparkContext.broadcast(dfA.collect)
dfB
.repartition(2)
.rdd
.foreach(rdd => {
import sqlContext.implicits._
import scala.collection.JavaConverters._
val schema5 = new StructType()
.add(StructField("uid", LongType, false))
.add(StructField("data", LongType, false))
val whoami = sqlContext.getClass.toString // OK, can access sqlContext
val data5 = broadcastA.value.toList.asJava // OK
// Fail here: NullPointerException
val df5 = sqlContext.createDataFrame(data5,schema5)//.toDF("id","data")
})