Plugin cannot find pyspark
 
            I built the dedicated code environment for the plug in and included pyspark in the plugin's code-env/python/spec/requirements.txt file and I still get the error below when trying to run import pyspark.
Installing a recent version of pandas in a plugin's code environment is a kludge: you have to include "corePackagesSet": "PANDAS13" in the desc.json file or the whole code-environment build fails.
Do you have to implement some similar kludge to get the pyspark package installed in the plugin's code environment?
Traceback (most recent call last):
  File "check_spark_run.py", line 4, in <module>
    from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'
Operating system used: centos
Answers
- 
             CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,149 ✭✭✭✭✭✭✭✭✭ CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,149 ✭✭✭✭✭✭✭✭✭Hi @clayms 
 support on this issue will be facilitated directly through our support portal. Once a solution is provided, if relevant, we'll post it here as well for the purpose of knowledge sharing.
- 
            There is a similar kludge. You have to include "kind": "PYSPARK", in the plugin's recipe.json file. 
 I was not able to find this documented anywhere.
 Also, I was hoping to start Spark from the shell with spark-submit, or a python script file, but doing this causes a bunch of other errors.
 Regarding the ModuleNotFound error, I fixed that by executing the python executable that is in the plugin's /bin/ directory. Calling python directly only executes the Design Node's python which may not have the libraries you built in the plugins code environment.