Estimate progress from explain plan
Currently, there's no progress indicator for recipes (unless those recipes are partitioned). Many SQL databases return the estimated number of result rows for a query via their explain plans. If Dataiku were to extract this estimated number of rows, a progress bar could be created showing current execution progress based on the current number of rows processed. This would be helpful for understanding the current progress on long-running flows. The logs do a good job of telling me how many rows have been inserted, but it's difficult to mentally translate this into a percent complete or a time estimate.
[20:30:09] [INFO] [dku.datasets.sql] - Read 1596680000 records from DB
If Dataiku could provide a progress indicator and/or ETA for recipe operations, it'd help a lot with knowing what to expect when planning data pipelines against large datasets.
Related: it'd be great to know the current queue depth for a recipe to understand whether the bottleneck is on the input or output.
Comments
-
Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 162 Dataiker
Thanks for your idea, @natejgardner
Your idea meets the criteria for submission, we'll reach out should we require more information.
If you’re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post or leave a comment!
Take care,
Ashley