Read as Spark dataframe

Dawood154 · August 2022

I installed Spark in a notebook environment. On creating the new pyspark notebook I get the following starter code:

.../

from pyspark import SparkContext

from pyspark.sql import SQLContext

sc = SparkContext()

sqlContext = SQLContext(sc)

dataset = dataiku.Dataset("name_of_the_dataset")

df = dkuspark.get_dataframe(sqlContext, dataset)

.../

The issue is that I have spark version 3.2.1 and since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset. So I am creating Spark session as follows:

spark = SparkSession.builder.master("local[1]").appName("SparkByExamples.com").getOrCreate() # cluster ip

Therefore running the following line gives me error

df = dkuspark.get_dataframe(sqlContext, dataset)

Error:

Py4JJavaError: An error occurred while calling o32.classForName. : java.lang.ClassNotFoundException: com.dataiku.dip.spark.StdDataikuSparkContext

fchataigner2 · August 2022

Hi,

the spark submit arguments aren't passing the needed jars to Spark, which means you probably haven't done the integration of Spark with DSS (see https://doc.dataiku.com/dss/latest/spark/installation.html ). On a related note, make sure you don't install pyspark as a package in your code env, since that should be handled by the install-spark-integration script.

Dawood154 · September 2022

Hi,

I did the spark integration with DSS. I am creating a Spark session as mentioned above. I need the updated DSS code to import data as Spark dataframe. I've read the documentation, but I can't seem to find the answer.

fchataigner2 · September 2022

once you have your Spark SQLContext object, you can simply

import dataiku.spark as dkuspark
# Example: Read the descriptor of a Dataiku dataset
mydataset = dataiku.Dataset("mydataset")
# And read it as a Spark dataframe
df = dkuspark.get_dataframe(sqlContext, mydataset)

Read as Spark dataframe

Tags

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Read as Spark dataframe

Tags

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories