The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Estimate progress from explain plan

Estimate progress from explain plan

0 Kudos

Currently, there's no progress indicator for recipes (unless those recipes are partitioned). Many SQL databases return the estimated number of result rows for a query via their explain plans. If Dataiku were to extract this estimated number of rows, a progress bar could be created showing current execution progress based on the current number of rows processed. This would be helpful for understanding the current progress on long-running flows. The logs do a good job of telling me how many rows have been inserted, but it's difficult to mentally translate this into a percent complete or a time estimate.

[20:30:09] [INFO] [dku.datasets.sql] - Read 1596680000 records from DB

If Dataiku could provide a progress indicator and/or ETA for recipe operations, it'd help a lot with knowing what to expect when planning data pipelines against large datasets.

Related: it'd be great to know the current queue depth for a recipe to understand whether the bottleneck is on the input or output.