Spark pipeline in scenario

julesbertrand
julesbertrand Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

Hi all,

I am trying to use a spark pipeline between two datasets, which we will call Raw and Clean. I have multiple spark SQL recipes to build clean from Raw, and I want to do execute all of them in a scenario.

In order to save time and memory, I want to pipeline them, but I can't find a way to do it a Scenario. To be clear, I want to execute all recipes from Raw to Clean in a spark pipeline and be able to schedule this job. Is there a way to do so ?

Thank you

Jules

Tagged:

Best Answer

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker

    @julesbertrand
    ,

    This would be done automatically if Spark pipelines are enabled and DSS is able to create a pipeline.

    https://doc.dataiku.com/dss/latest/spark/pipelines.html

    You can check whether a Spark pipeline has been created look at the job from the scenario.

    Thanks,

  • julesbertrand
    julesbertrand Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

    Hi Alex,

    If the only step in my scenario is "build the last dataset", it does create a spark pipeline as you said, however all prior datasets in the flow are rebuilt. I want to choose from where to where my scenario will build datasets. And if I add multiple "build" steps in the scenario (one for each dataset or so), I don't have the spark pipeline...

    Do you have any advise ,

    Thank you

    Jules

Setup Info
    Tags
      Help me…