Scenario error using a notebook step

VickeyC Registered Posts: 27 ✭✭✭✭

I have a scenario which is using a Python notebook to execute a step. The notebook runs perfectly by itself, but throws a "module not found" error when it's added to a scenario step: "No module named 'pyspark'". Any suggestions on how to fix this?

Operating system used: RHEL 7.9


Best Answer

  • Sergey
    Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
    Answer ✓

    Hi @VickeyC

    I see that you have also opened a support ticket so I am copying the answer from it:

    We confirm that it is not possible to run Pyspark notebooks through the Execute scenario step.

    You will need to use a Pyspark recipe instead and a "Build" scenario step to build the output. Note that you can use a "dummy" output in your recipe, i.e. creating a dataset but not ​actually writing to it.


Setup Info
      Help me…