PYSPARK_PYTHON environment variable issue in PySpark

Farhan
Farhan Registered Posts: 27 ✭✭✭✭
edited November 2024 in Setup & Configuration

Hi

I am facing below issue for PySpark recipe.

Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

I have set the environment variables using os.environ() by providing a file path but somehow this approach only works if I run the code via Jupyter but will not work when the same run via recipe.

So i did some digging and found below community post:

PySpark Python executables — Dataiku Community

Based on above, can this issue be resolved if I set the python binary path in the code environment setting section.

(Unfortunately, I cant upload a picture so let me write it down how it looks like)

Spark

Yarn Python executable [ ] Python binary on the Yarn nodes for Pyspark (save, remove then re-install jupyter support to update in notebooks)

Operating system used: Windows 10

Operating system used: Windows 10

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,288 Dataiker

    Hi,

    Could you please open a support ticket with the job diagnostics for the Pyspark recipe?
    Are you using Yarn or Spark on K8s?

    If you are using Yarn, you can set the path to python3.6 to match the driver.

    Screenshot 2025-01-02 at 8.46.19 PM.png


    Thanks

Setup Info
    Tags
      Help me…