improve check behaviour on a partioned table at the dataset level (≠ partition level)

Tanguy · October 2022

Checks help to make sure that the flow’s pipeline is behaving like expected. They also accelerate the development process by:

early-stopping a job
pinpointing which table did not pass a check (e.g. « Checks on the outputs produced 1 errors »).

However, I noticed that these two features do not always work when implementing a check at the entire dataset level (not at the partition level) of a partitioned table. In particular:

a failing check does not stop during the recursive build job, the error is only raised at the end of the entire job (thus, if it is time-consuming, the user can wait a long time before being notified of a problem)
the message is not explicit. In particular, the message does not point out which table failed to pass the check and the error type.

Here is an illustration of this issue:

The check that fails on a partitioned table (note that the checks is computed at each build) :

The check behaviour when re-building that partitioned table in a recursive build :

This improvement could be added to this one.

cc: @Turribeach

Marlan · October 2022

I've raised the issue of the check failing message not being informative recently in direct feedback to someone at Dataiku. This happens on all datasets, not just partitioned ones.

In general, metrics and checks are a really nice feature but there are some improvements that could be made both in how they operate and in the UI (too many clicks).

Marlan

Tanguy · October 2022

Thanks for the info @Marlan
. For NP datasets, I haven't encountered any problems. Do you have a simple example to reproduce?

Elie · November 2022

Thanks for your idea @tanguy
Your idea meets the criteria for submission, we'll reach out should we require more information.

If you’re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post!

Take care

improve check behaviour on a partioned table at the dataset level (≠ partition level)

In the Backlog · Last Updated October 2022

Comments

Categories

Setup Info

Tags