Optimization for runtime

Mankaran119 · December 2021

I have a flow with a lot of recipes - python/SQL/SPARK & snowflake datasets, with multiple scenarios created & optimized engine used.

Still my flow takes 3 hours to run & I am looking for ways to reduce this runtime, any tips & methods anyone can suggets

Operating system used: Windows

Manuel · December 2021

Hi,

It is hard to suggest actions without looking at the specific flow, but here are some generic suggestions:

Are you already using partitions? They allow you to process incremental chunks of data, instead of the entire datasets all the time, thus reducing runtime. See https://academy.dataiku.com/advanced-partitioning
Are you already using pipelines? They allow you to process consecutive recipes within the same engine in one go, reducing communication with DSS. See https://doc.dataiku.com/dss/latest/spark/pipelines.html for Spark and https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html for SQL
For recipes with multiple input datasets, are the datasets in the same connection? This allows you to push that computation to the relevant engine, thus improving performance.

Other than this, contact your Dataiku Customer Success Manager, they can help you engage Dataiku services to optimise your flow.

I hope this helps.

Best regards

Optimization for runtime

Answers

Categories

Setup Info

Tags