Estimate progress from explain plan

Currently, there's no progress indicator for recipes (unless those recipes are partitioned). Many SQL databases return the estimated number of result rows for a query via their explain plans. If Dataiku were to extract this estimated number of rows, a progress bar could be created showing current execution progress based on the current number of rows processed. This would be helpful for understanding the current progress on long-running flows. The logs do a good job of telling me how many rows have been inserted, but it's difficult to mentally translate this into a percent complete or a time estimate.

[20:30:09] [INFO] [dku.datasets.sql] - Read 1596680000 records from DB

If Dataiku could provide a progress indicator and/or ETA for recipe operations, it'd help a lot with knowing what to expect when planning data pipelines against large datasets.

Related: it'd be great to know the current queue depth for a recipe to understand whether the bottleneck is on the input or output.

1 Comment
AshleyW
Dataiker

Thanks for your idea, @natejgardner 

Your idea meets the criteria for submission, we'll reach out should we require more information. 

If youโ€™re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post or leave a comment!

Take care,

Ashley

Status changed to: In the Backlog

Thanks for your idea, @natejgardner 

Your idea meets the criteria for submission, we'll reach out should we require more information. 

If youโ€™re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post or leave a comment!

Take care,

Ashley