Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello community,
After the upgrade of Dataiku to 10.0.4, suddenly a Pyspark recipe didnt work no more on the default Python 2.7 code env due to syntactical error stated below:
File "/appl/dataiku/dataiku-dss-10.0.4/spark-standalone-home/python/pyspark/find_spark_home.py", line 68
print("Could not find valid SPARK_HOME while searching {0}".format(paths), file=sys.stderr) ^ SyntaxError: invalid syntax
Our guess is that the underlying pyspark scripts have been updated with Python 3 code and that Python 2 code is deprecated/removed. Is this correct? If so, can Pyspark never be used again with Python 2 code env?
Before the upgrade we had 9.0.3 and the script just worked fine with Python 2.
Hope someone can help me solving this problem.
Thank you in advance,
Nofit Kartoredjo
Operating system used: RedHat
Operating system used: RedHat
The Base python was likely upgraded to Python3 as part of your upgrade.
1) If you want to override this for setting for a particular notebook, you can to these properties in the SparkConf parameter that's being passed to the SparkContext's create (or getOrCreate) method.
from pyspark import SparkConf
myconf = SparkConf()
myconf.set("spark.pyspark.python", "python2.7")
2) To use python2 for your PySpark globally for PySpark Notebooks you will need to use set a new config with spark.pyspark.python -> python2.7. After saving the new config you should perform a hard refresh CRTL + SHIFT +R. You do need to restart DSS after making this change.
Let me know if that helps!
Hi Alex,
Thanks for the suggestion. The first approach is what we are looking for; so using 2.7 for one specific notebook. The problem is that the error already starts at the import level (see picture) when we are using a code env with Python 2 and pyspark installed. So, even though I would add your code, it wouldn't make a difference because the error starts at this line:
from dataiku import spark as dkuspark
The scripts in this module probably use python3 .. is there anyway to adjust this?
Thanks again!
Hi,
Could you confirm if the spark integration was re-run after upgrading?
./bin/dssadmin install-spark-integration -standaloneArchive /PATH/TO/dataiku-dss-spark-standalone -forK8S
The spark standalone lib can be found at
https://doc.dataiku.com/dss/latest/containers/setup-k8s.html#optional-setup-spark
If that still doesn't help in understanding I would suggest you raise a support ticket with the instance diagnostic.
Hi Alex,
Thank you for the help. You can close the ticket since I told the user to just use Python 3.6 since Python 2 gives warning of deprecation