improve check behaviour on a partioned table at the dataset level (≠ partition level)

Tanguy
Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 124 Neuron

Checks help to make sure that the flow’s pipeline is behaving like expected. They also accelerate the development process by:

  1. early-stopping a job
  2. pinpointing which table did not pass a check (e.g. « Checks on the outputs produced 1 errors »).

However, I noticed that these two features do not always work when implementing a check at the entire dataset level (not at the partition level) of a partitioned table. In particular:

  1. a failing check does not stop during the recursive build job, the error is only raised at the end of the entire job (thus, if it is time-consuming, the user can wait a long time before being notified of a problem)
  2. the message is not explicit. In particular, the message does not point out which table failed to pass the check and the error type.

Here is an illustration of this issue:

  1. The check that fails on a partitioned table (note that the checks is computed at each build) :

tanguy_0-1666191344764.jpeg

  1. The check behaviour when re-building that partitioned table in a recursive build :tanguy_1-1666191344774.jpeg

This improvement could be added to this one.

cc: @Turribeach

5
5 votes

In the Backlog · Last Updated

Comments

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron

    I've raised the issue of the check failing message not being informative recently in direct feedback to someone at Dataiku. This happens on all datasets, not just partitioned ones.

    In general, metrics and checks are a really nice feature but there are some improvements that could be made both in how they operate and in the UI (too many clicks).

    Marlan

  • Tanguy
    Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 124 Neuron

    Thanks for the info @Marlan
    . For NP datasets, I haven't encountered any problems. Do you have a simple example to reproduce?

  • Elie
    Elie Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Product Ideas Manager Posts: 32 Dataiker

    Thanks for your idea @tanguy
    Your idea meets the criteria for submission, we'll reach out should we require more information.

    If you’re reading this and think this would be a great capability to add to DSS, be sure to kudos the original post!

    Take care

Setup Info
    Tags
      Help me…