Can you please help with the documentation on spark vs dss engine

vaibhav25
vaibhav25 Registered Posts: 2

Hi Team,

Requesting your urgent attention to help us with the official documentation on spark vs dss engine with actual scenarios.

Primarily we are looking when to use what.

What are the recipe which fit good for spark engine? Secondly if a dataset is small why spark engine consume more time then DSS engine?

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,321 Neuron
    edited March 12

    There is no magic logic to decide which compute engine will be better for each recipe. Generally speaking Spark should be better for larger datasets but it will depend on many other factors. If using a visual recipe sometimes the Spark translation is not very efficient and DSS engine is better. Sometimes custom Spark code can improve on what DSS does. There is also an overhead to using Spark on small datasets so it might not be worth doing so. The best approach to determine the best engine is to run on both engines and see which one is the fastest. You can multi-select recipes in the flow holding the Ctrl key and change the engine for all of them. You can also use the Recipe Engines flow view to see where each recipe is running. and in v13 you can even combine these two views at the same time to get something like this which shows me the last build duration for all DSS Engine recipes:

    So if you were to flip all these to Spark, re-run and then compare the two views and figure which one runs faster than the other.

Setup Info
    Tags
      Help me…