Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I built the dedicated code environment for the plug in and included pyspark in the plugin's code-env/python/spec/requirements.txt file and I still get the error below when trying to run import pyspark.
Installing a recent version of pandas in a plugin's code environment is a kludge: you have to include "corePackagesSet": "PANDAS13" in the desc.json file or the whole code-environment build fails.
Do you have to implement some similar kludge to get the pyspark package installed in the plugin's code environment?
Traceback (most recent call last): File "check_spark_run.py", line 4, in <module> from pyspark.sql import SparkSession ModuleNotFoundError: No module named 'pyspark'
Operating system used: centos
There is a similar kludge. You have to include "kind": "PYSPARK", in the plugin's recipe.json file.
I was not able to find this documented anywhere.
Also, I was hoping to start Spark from the shell with spark-submit, or a python script file, but doing this causes a bunch of other errors.
Regarding the ModuleNotFound error, I fixed that by executing the python executable that is in the plugin's /bin/ directory. Calling python directly only executes the Design Node's python which may not have the libraries you built in the plugins code environment.
Thank you for sharing your feedback and solution with the rest of the community @clayms!