Scenario error using a notebook step

Solved!
VickeyC
Level 3
Scenario error using a notebook step

I have a scenario which is using a Python notebook to execute a step.  The notebook runs perfectly by itself, but throws a "module not found" error when it's added to a scenario step: "No module named 'pyspark'".  Any suggestions on how to fix this?


Operating system used: RHEL 7.9

0 Kudos
1 Solution
sergeyd
Dataiker

Hi @VickeyC 

I see that you have also opened a support ticket so I am copying the answer from it: 

We confirm that it is not possible to run Pyspark notebooks through the Execute scenario step.
โ€‹
You will need to use a Pyspark recipe instead and a "Build" scenario step to build the output. Note that you can use a "dummy" output in your recipe, i.e. creating a dataset but not โ€‹actually writing to it.

 

 

View solution in original post

0 Kudos
6 Replies
sergeyd
Dataiker

Hi @VickeyC 

This could be an issue with missing spark jars. Have you run spark-integration on this DSS instance? 

0 Kudos
VickeyC
Level 3
Author

@sergeyd, Our Dataiku environment runs on a Hadoop Edge node.   We don't use Docker, Elastic AI, or Kubernetes

0 Kudos
sergeyd
Dataiker

Hi @VickeyC 

Thanks for the details. So have you run spark-integration?

0 Kudos
VickeyC
Level 3
Author

@sergeyd  Yes, I believe that we ran that when we installed Dataiku.  We have a Spark tab in our admin settings.

0 Kudos
sergeyd
Dataiker

Hi @VickeyC 

I see that you have also opened a support ticket so I am copying the answer from it: 

We confirm that it is not possible to run Pyspark notebooks through the Execute scenario step.
โ€‹
You will need to use a Pyspark recipe instead and a "Build" scenario step to build the output. Note that you can use a "dummy" output in your recipe, i.e. creating a dataset but not โ€‹actually writing to it.

 

 

0 Kudos
VickeyC
Level 3
Author

Yes, thanks for your help!

0 Kudos

Labels

?
Labels (1)

Setup info

?
A banner prompting to get Dataiku