Plugin cannot find pyspark

clayms
Level 3
Plugin cannot find pyspark

I built the dedicated code environment for the plug in and included pyspark in the plugin's code-env/python/spec/requirements.txt file and I still get the error below when trying to run import pyspark.  

Installing a recent version of pandas in a plugin's code environment is a kludge: you have to include "corePackagesSet": "PANDAS13" in the desc.json file or the whole code-environment build fails.

Do you have to implement some similar kludge to get the pyspark package installed in the plugin's code environment?

 

Traceback (most recent call last):
  File "check_spark_run.py", line 4, in <module>
    from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'

Operating system used: centos

 

0 Kudos
3 Replies
CoreyS
Dataiker Alumni

Hi @clayms support on this issue will be facilitated directly through our support portal. Once a solution is provided, if relevant, we'll post it here as well for the purpose of knowledge sharing. 

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
clayms
Level 3
Author

There is a similar kludge.  You have to include "kind": "PYSPARK", in the plugin's recipe.json file.  

I was not able to find this documented anywhere.  

Also, I was hoping to start Spark from the shell with spark-submit, or a python script file, but doing this causes a bunch of other errors.  

Regarding the ModuleNotFound error, I fixed that by executing the python executable that is in the plugin's /bin/ directory.  Calling python directly only executes the Design Node's python which may not have the libraries you built in the plugins code environment.

CoreyS
Dataiker Alumni

Thank you for sharing your feedback and solution with the rest of the community @clayms

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos