Metrics and Checks

unkxowx
unkxowx Registered Posts: 19

Hey, hope y'all are doing great

I have a quick question regarding using metrics and checks.

I have a table and I want to do basic checks such as duplication of primary keys, duplication of records, missing values in a particular column etc.

I used metrics to compute the count all of these along with checks and have them displayed in a dashboard where clicking on the tile takes you back to the sub-dataset i.e., if there are null values in the foreign key column of my dataset, the tile displaying that error will take me to that specific subset where all the foreign key values are null.

These 'specific datasets were created using SQL recipe, so basically, I had to split my dataset into multiple flows with SQL query for each check and then do my checks on those individual subsets.

This has resulted in creating many many workflows (literally one for each check) and having to display that subset is crucial for the use-case. Is there a way to optimize this that does not require in creating multiple flows?

Thank you for the assistance

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,072 Neuron
    Answer ✓

    So the right way of doing this is to have your metrics added as columns so you can filter the rows as needed in the original dataset, no needs to create separate datasets. So in your dataset create new columns called "Duplicate_Flag" and populate it accordingly. The Null in column ABC doesn't need a separate column as you can filter on it already. Then you can create a new insight which could basically be your dataset with the correct filters applied for each metric drill down. Finally you can edit the metric tile in your dashboard "Behavior on click" to "Open other Insight" which will point to the corresponding filtered Insight.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,072 Neuron

    I don't see why you need to create all these sub-datasets. Can you explain your requirement more clearly?

  • unkxowx
    unkxowx Registered Posts: 19

    so lets say i have 2 dq checks that i need to do:
    1) check duplicate records (rows having same values but different primary keys)
    2) check records with null values in column ABC

    Now, with metrics and checks i was able to obtain:
    1) # of records that are duplicates
    2) # of records that have null values in ABC

    In the dashboard, there will be a tile that displays a check of either OKAY or ERROR based on the metric numbers. Let's say both my checks have errors. Now, what I want to achieve is that by clicking on each tile I want to see a dataset that has:
    1) only those records that are duplicates
    2) only those rows that have null values in ABC.

  • unkxowx
    unkxowx Registered Posts: 19

    ok agreed with applying filters on nulls and creating a new col for duplicates but my flow zone will have 3 datasets now, right? One original, other with filter recipe on duplicate_flag and last one with the filter of null values in ABC ?

    So, if i got this right, you're saying use only the original dataset to calculate ALL the metrics but then use filter recipes to filter them out and have those sub-datasets in the flow that will be used in the dashboard? Or is there a way to actually apply filter on the original dataset in the dashboard itself?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,072 Neuron

    Not really. I never said using Recipes or Filters. I said you should use an Insight:

    https://doc.dataiku.com/dss/latest/dashboards/concepts.html#insights

    An insight can have filters applied to it. Hence all you need is separate insights for each metric. All of them could come from a single dataset.

  • unkxowx
    unkxowx Registered Posts: 19

    ah okay, I'll have a look at that then. Thank you so much for your help

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,072 Neuron

    Under the same menu as Dashboards you have Insights. Select Insights. Select the dataset and give the Insight a name. Then click on it, Edit it and filter as desired. Finally link from the metric dashboard tile as I previously advised.

  • unkxowx
    unkxowx Registered Posts: 19

    Yeah, I think I just created one and that look a lot more feasible than having a weird flow in the zone. Thank you, I really appreciate it, it's my first time using Dataiku so kind of got lost in between.

  • unkxowx
    unkxowx Registered Posts: 19

    Actually, one more thing, I have set up scenarios to auto-refresh data, compute metrics and checks. But that build data from scratch with my snowflake connection. Is partitioning something that should be utilized here intead?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,072 Neuron

    New question, new thread. Please mark this thread as accepted answer.

  • unkxowx
    unkxowx Registered Posts: 19

    sure, will do another thread. Before that, in continuation to what you had said already in regards to creating a flag column for duplicates - what approach do you suggest for doing that? concatenation of all columns? etc?

Setup Info
    Tags
      Help me…