Only Python 3 for Pyspark in 10.0.4?
Hello community,
After the upgrade of Dataiku to 10.0.4, suddenly a Pyspark recipe didnt work no more on the default Python 2.7 code env due to syntactical error stated below:
File "/appl/dataiku/dataiku-dss-10.0.4/spark-standalone-home/python/pyspark/find_spark_home.py", line 68
print("Could not find valid SPARK_HOME while searching {0}".format(paths), file=sys.stderr) ^ SyntaxError: invalid syntax
Our guess is that the underlying pyspark scripts have been updated with Python 3 code and that Python 2 code is deprecated/removed. Is this correct? If so, can Pyspark never be used again with Python 2 code env?
Before the upgrade we had 9.0.3 and the script just worked fine with Python 2.
Hope someone can help me solving this problem.
Thank you in advance,
Nofit Kartoredjo
Operating system used: RedHat
Operating system used: RedHat
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Spark is no longer compatible with Python2 in recent DSS releases.
-
Hi Alex,
Thanks for the suggestion. The first approach is what we are looking for; so using 2.7 for one specific notebook. The problem is that the error already starts at the import level (see picture) when we are using a code env with Python 2 and pyspark installed. So, even though I would add your code, it wouldn't make a difference because the error starts at this line:
from dataiku import spark as dkuspark
The scripts in this module probably use python3 .. is there anyway to adjust this?
Thanks again!
-
Hi Alex,
Thank you for the help. You can close the ticket since I told the user to just use Python 3.6 since Python 2 gives warning of deprecation