Optimization for runtime
Mankaran119
Registered Posts: 2 ✭✭✭
I have a flow with a lot of recipes - python/SQL/SPARK & snowflake datasets, with multiple scenarios created & optimized engine used.
Still my flow takes 3 hours to run & I am looking for ways to reduce this runtime, any tips & methods anyone can suggets
Operating system used: Windows
Answers
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
It is hard to suggest actions without looking at the specific flow, but here are some generic suggestions:
- Are you already using partitions? They allow you to process incremental chunks of data, instead of the entire datasets all the time, thus reducing runtime. See https://academy.dataiku.com/advanced-partitioning
- Are you already using pipelines? They allow you to process consecutive recipes within the same engine in one go, reducing communication with DSS. See https://doc.dataiku.com/dss/latest/spark/pipelines.html for Spark and https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html for SQL
- For recipes with multiple input datasets, are the datasets in the same connection? This allows you to push that computation to the relevant engine, thus improving performance.
Other than this, contact your Dataiku Customer Success Manager, they can help you engage Dataiku services to optimise your flow.
I hope this helps.
Best regards