How to build a 'Zero Records Check'

Options
Registered Posts: 5
edited November 2024 in Using Dataiku

In Alteryx I was able to build a zero records check that made it possible for me to have a "no data" entry for a particular path in my workflow. This is used in reporting to show that there were no items in that particular section. 

 

image-71bcc173ebde3-31f3.jpeg
  1. Count Records
  2. Append the record count (could include 0)
  3. Replace Nulls with 0 or Blank
  4. Formula checks for an empty field and replaces it with "No Data"

Is there a way to do this in Dataiku?

Operating system used: Microsoft Windows 11

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,551 Neuron

    In dataiku you can set Metrics to count records and Data Quality rules to take an action if the rwo count is not within expected values:

    https://knowledge.dataiku.com/latest/mlops-o16n/data-quality/tutorial-data-quality.html

    These can be shown in dashboards or scenarios or you can have conditional logic with them. I am unsure as what is your real requirement here. What's the point of having a field saying "no data". Makes no sense to me. Use a

  • Registered Posts: 5

    I investigated the Metrics option and it will not work for me. The reason I need to create a row with a "no data" field is for reporting purposes. My workflow joins data and, based on particular fields, groups it by a numerical condition that requires a particular action. The flow outputs both a spreadsheet as well as an email with summary sections containing a table for each record that is in that section. It is better (in my opinion) to have a row in the table that reads "no data" if there aren't any records instead of it being completely blank.

  • Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,551 Neuron

    There are many ways to do what you want but pretty much all of them will start with a metric as that what you can use to determine your flow has no data and you execute a dummy flow zone that sends a dummy output. Have a look at this conditional execution of scenario steps post I made.

  • Registered Posts: 5

    I think that I figured it all out, but it seems like so many extra steps and . Here's how I did it. 

     I have numerical conditions (0-9) that get grouped into 4 categories. 

    • I joined an editable dataset that contains 10 rows (integers 0-9) to my data using a Right Join. 
      • This created a row for each of the conditions even if there wasn't a corresponding condition value in the main dataset. 
    • I used a prepare recipe formula to check for undefined values in a field (Full Name) and if true, populated them with "No Data".
      • This, when the categories are combined, could result in one or more "no data" fields that need to be removed before the output step. 
    • I used a window recipe to remove duplicates before splitting the data into the categories. 
    • After the split into groups, I did a sort recipe to compute the row count.
    • I then used another prepare recipe formula to filter out rows where Full_Name == "No Data" && Row_Count != 1

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.