Only Python 3 for Pyspark in 10.0.4?

JazzminnNo
Level 2
Only Python 3 for Pyspark in 10.0.4?

Hello community,

After the upgrade of Dataiku to 10.0.4, suddenly a Pyspark recipe didnt work no more on the default Python 2.7 code env due to syntactical error stated below: 

File "/appl/dataiku/dataiku-dss-10.0.4/spark-standalone-home/python/pyspark/find_spark_home.py", line 68    
print("Could not find valid SPARK_HOME while searching {0}".format(paths), file=sys.stderr)
^ SyntaxError: invalid syntax

Our guess is that the underlying pyspark scripts have been updated with Python 3 code and that Python 2 code is deprecated/removed. Is this correct? If so, can Pyspark never be used again with Python 2 code env?

Before the upgrade we had 9.0.3 and the script just worked fine with Python 2.

Hope someone can help me solving this problem.

Thank you in advance,

Nofit Kartoredjo


Operating system used: RedHat


Operating system used: RedHat

0 Kudos
3 Replies
AlexT
Dataiker

Spark is no longer compatible with Python2 in recent DSS releases. 

0 Kudos
JazzminnNo
Level 2
Author

Hi Alex,

Thanks for the suggestion. The first approach is what we are looking for; so using 2.7 for one specific notebook. The problem is that the error already starts at the import level (see picture) when we are using a code env with Python 2 and pyspark installed. So, even though I would add your code, it wouldn't make a difference because the error starts at this line:

from dataiku import spark as dkuspark

The scripts in this module probably use python3 .. is there anyway to adjust this? 

Thanks again!

 

0 Kudos
JazzminnNo
Level 2
Author

Hi Alex,

Thank you for the help. You can close the ticket since I told the user to just use Python 3.6 since Python 2 gives warning of deprecation