Community Conundrum 28: News Engagement is live! Read More

PySpark Recipes persist DataFrame

Level 2
PySpark Recipes persist DataFrame


I'm using PySpark Recipes. To reduce the time of execution + reduce memory storage, I would like to use the function:



But I have this error message: 'Job failed: Pyspark code failed: At line 186: <type 'exceptions.AttributeError'>: 'SparkSession' object has no attribute '_getJavaStorageLevel'

Any idea??? Thank you for your help!

4 Replies


Are you using a SparkSession or a SQLContext to create your dataframes ? Whichever you are using, can you please try with the other one ?

Level 2

This is a part of my code:

import dataiku
from dataiku import spark as dkuspark
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession, SQLContext
import pyspark

from pyspark import StorageLevel

config = pyspark.SparkConf().setAll([(
'spark.executor.memory', '64g'), (
'spark.executor.cores', '8'), (
'spark.cores.max', '8'), (

spark = SparkSession.builder.config(conf=config).getOrCreate()
sc = SQLContext(spark)

dataset = dataiku.Dataset("my_dataset")
df = dkuspark.get_dataframe(sc, dataset)


=> I've got an error on the persist function.

Again thank you for your help. 


It seems that Spark does not like mixing old and new style APIs (SQLContext created from a SparkSession instead of a SparkContext). Could you please try, but instead of creating a SparkSession, you create a SparkContext ?

sc = SparkContext(conf=config)
sqlContext = SQLContext(sc)
df = dkuspark.get_dataframe(sc, dataset)
Level 2

Hi Clément,

Ok it works great! Just for the futur readers of the post, when you're creating your dataframe, use sqlContext

df = dkuspark.get_dataframe(sqlContext, dataset)

Thank you Clément, nice to have the help of the CTO of DSS. It's not always easy to deal with the old and the new version of Spark vs NoteBook / Recipes.

Best regards! (A bientôt)

A banner prompting to get Dataiku DSS