improve check behaviour on a partioned table at the dataset level (≠ partition level)

Checks help to make sure that the flow’s pipeline is behaving like expected. They also accelerate the development process by:

  1. early-stopping a job
  2. pinpointing which table did not pass a check (e.g. « Checks on the outputs produced 1 errors »).

However, I noticed that these two features do not always work when implementing a check at the entire dataset level (not at the partition level) of a partitioned table. In particular:

  1. a failing check does not stop during the recursive build job, the error is only raised at the end of the entire job (thus, if it is time-consuming, the user can wait a long time before being notified of a problem)
  2. the message is not explicit. In particular, the message does not point out which table failed to pass the check and the error type.

Here is an illustration of this issue:

  1. The check that fails on a partitioned table (note that the checks is computed at each build) :

tanguy_0-1666191344764.jpeg

  1. The check behaviour when re-building that partitioned table in a recursive build :tanguy_1-1666191344774.jpeg

This improvement could be added to this one.

cc: @Turribeach 

 

 

3 Comments

I've raised the issue of the check failing message not being informative recently in direct feedback to someone at Dataiku. This happens on all datasets, not just partitioned ones.

In general, metrics and checks are a really nice feature but there are some improvements that could be made both in how they operate and in the UI (too many clicks).

Marlan

I've raised the issue of the check failing message not being informative recently in direct feedback to someone at Dataiku. This happens on all datasets, not just partitioned ones.

In general, metrics and checks are a really nice feature but there are some improvements that could be made both in how they operate and in the UI (too many clicks).

Marlan

Thanks for the info @Marlan. For NP datasets, I haven't encountered any problems. Do you have a simple example to reproduce?

Thanks for the info @Marlan. For NP datasets, I haven't encountered any problems. Do you have a simple example to reproduce?

ElieA
Dataiker

Thanks for your idea @tanguy Your idea meets the criteria for submission, we'll reach out should we require more information.

If you’re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post!

Take care

Status changed to: In the Backlog

Thanks for your idea @tanguy Your idea meets the criteria for submission, we'll reach out should we require more information.

If you’re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post!

Take care