Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I'm using PySpark Recipes. To reduce the time of execution + reduce memory storage, I would like to use the function:
DataFrame.persist()
DataFrame.unpersist()
But I have this error message: 'Job failed: Pyspark code failed: At line 186: <type 'exceptions.AttributeError'>: 'SparkSession' object has no attribute '_getJavaStorageLevel'
Any idea??? Thank you for your help!
It seems that Spark does not like mixing old and new style APIs (SQLContext created from a SparkSession instead of a SparkContext). Could you please try, but instead of creating a SparkSession, you create a SparkContext ?
sc = SparkContext(conf=config)
sqlContext = SQLContext(sc)
df = dkuspark.get_dataframe(sc, dataset)
Hi Clément,
Ok it works great! Just for the futur readers of the post, when you're creating your dataframe, use sqlContext
df = dkuspark.get_dataframe(sqlContext, dataset)
Thank you Clément, nice to have the help of the CTO of DSS. It's not always easy to deal with the old and the new version of Spark vs NoteBook / Recipes.
Best regards! (A bientôt)
Hi,
Are you using a SparkSession or a SQLContext to create your dataframes ? Whichever you are using, can you please try with the other one ?
This is a part of my code:
import dataiku
from dataiku import spark as dkuspark
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession, SQLContext
import pyspark
from pyspark import StorageLevel
config = pyspark.SparkConf().setAll([(
'spark.executor.memory', '64g'), (
'spark.executor.cores', '8'), (
'spark.cores.max', '8'), (
'spark.driver.memory','64g')])
spark = SparkSession.builder.config(conf=config).getOrCreate()
sc = SQLContext(spark)
dataset = dataiku.Dataset("my_dataset")
df = dkuspark.get_dataframe(sc, dataset)
df.persist(StorageLevel.MEMORY_AND_DISK)
=> I've got an error on the persist function.
Again thank you for your help.
It seems that Spark does not like mixing old and new style APIs (SQLContext created from a SparkSession instead of a SparkContext). Could you please try, but instead of creating a SparkSession, you create a SparkContext ?
sc = SparkContext(conf=config)
sqlContext = SQLContext(sc)
df = dkuspark.get_dataframe(sc, dataset)
Hi Clément,
Ok it works great! Just for the futur readers of the post, when you're creating your dataframe, use sqlContext
df = dkuspark.get_dataframe(sqlContext, dataset)
Thank you Clément, nice to have the help of the CTO of DSS. It's not always easy to deal with the old and the new version of Spark vs NoteBook / Recipes.
Best regards! (A bientôt)