Scenario error using a notebook step
I have a scenario which is using a Python notebook to execute a step. The notebook runs perfectly by itself, but throws a "module not found" error when it's added to a scenario step: "No module named 'pyspark'". Any suggestions on how to fix this?
Operating system used: RHEL 7.9
Best Answer
-
Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
Hi @VickeyC
I see that you have also opened a support ticket so I am copying the answer from it:
We confirm that it is not possible to run Pyspark notebooks through the Execute scenario step.
You will need to use a Pyspark recipe instead and a "Build" scenario step to build the output. Note that you can use a "dummy" output in your recipe, i.e. creating a dataset but not actually writing to it.
Answers
-
Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
Hi @VickeyC
This could be an issue with missing spark jars. Have you run spark-integration on this DSS instance?
-
@sergeyd, Our Dataiku environment runs on a Hadoop Edge node. We don't use Docker, Elastic AI, or Kubernetes
-
@sergeyd
Yes, I believe that we ran that when we installed Dataiku. We have a Spark tab in our admin settings. -
Yes, thanks for your help!