PYSPARK_PYTHON environment variable issue in PySpark
Hi
I am facing below issue for PySpark recipe.
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
I have set the environment variables using os.environ() by providing a file path but somehow this approach only works if I run the code via Jupyter but will not work when the same run via recipe.
So i did some digging and found below community post:
PySpark Python executables — Dataiku Community
Based on above, can this issue be resolved if I set the python binary path in the code environment setting section.
(Unfortunately, I cant upload a picture so let me write it down how it looks like)
Spark
Yarn Python executable [ ] Python binary on the Yarn nodes for Pyspark (save, remove then re-install jupyter support to update in notebooks)
Operating system used: Windows 10
Operating system used: Windows 10