Scenario error using a notebook step

VickeyC · May 2022

I have a scenario which is using a Python notebook to execute a step. The notebook runs perfectly by itself, but throws a "module not found" error when it's added to a scenario step: "No module named 'pyspark'". Any suggestions on how to fix this?

Operating system used: RHEL 7.9

Sergey · June 2022

Hi @VickeyC

I see that you have also opened a support ticket so I am copying the answer from it:

We confirm that it is not possible to run Pyspark notebooks through the Execute scenario step.

You will need to use a Pyspark recipe instead and a "Build" scenario step to build the output. Note that you can use a "dummy" output in your recipe, i.e. creating a dataset but not actually writing to it.

Sergey · May 2022

Hi @VickeyC

This could be an issue with missing spark jars. Have you run spark-integration on this DSS instance?

VickeyC · June 2022

@sergeyd, Our Dataiku environment runs on a Hadoop Edge node. We don't use Docker, Elastic AI, or Kubernetes

Sergey · June 2022

Hi @VickeyC

Thanks for the details. So have you run spark-integration?

VickeyC · June 2022

@sergeyd
Yes, I believe that we ran that when we installed Dataiku. We have a Spark tab in our admin settings.

VickeyC · June 2022

Yes, thanks for your help!

Scenario error using a notebook step

Best Answer

Answers

Categories

Setup Info

Tags