How to build a 'Zero Records Check'

jfurda_HMK77
jfurda_HMK77 Registered Posts: 3
edited November 14 in Using Dataiku

In Alteryx I was able to build a zero records check that made it possible for me to have a "no data" entry for a particular path in my workflow. This is used in reporting to show that there were no items in that particular section. 

 

  1. Count Records
  2. Append the record count (could include 0)
  3. Replace Nulls with 0 or Blank
  4. Formula checks for an empty field and replaces it with "No Data"

Is there a way to do this in Dataiku?

Operating system used: Microsoft Windows 11

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron

    In dataiku you can set Metrics to count records and Data Quality rules to take an action if the rwo count is not within expected values:

    https://knowledge.dataiku.com/latest/mlops-o16n/data-quality/tutorial-data-quality.html

    These can be shown in dashboards or scenarios or you can have conditional logic with them. I am unsure as what is your real requirement here. What's the point of having a field saying "no data". Makes no sense to me. Use a

  • jfurda_HMK77
    jfurda_HMK77 Registered Posts: 3

    I investigated the Metrics option and it will not work for me. The reason I need to create a row with a "no data" field is for reporting purposes. My workflow joins data and, based on particular fields, groups it by a numerical condition that requires a particular action. The flow outputs both a spreadsheet as well as an email with summary sections containing a table for each record that is in that section. It is better (in my opinion) to have a row in the table that reads "no data" if there aren't any records instead of it being completely blank.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron

    There are many ways to do what you want but pretty much all of them will start with a metric as that what you can use to determine your flow has no data and you execute a dummy flow zone that sends a dummy output. Have a look at this conditional execution of scenario steps post I made.

  • jfurda_HMK77
    jfurda_HMK77 Registered Posts: 3

    I think that I figured it all out, but it seems like so many extra steps and . Here's how I did it. 

     I have numerical conditions (0-9) that get grouped into 4 categories. 

    • I joined an editable dataset that contains 10 rows (integers 0-9) to my data using a Right Join. 
      • This created a row for each of the conditions even if there wasn't a corresponding condition value in the main dataset. 
    • I used a prepare recipe formula to check for undefined values in a field (Full Name) and if true, populated them with "No Data".
      • This, when the categories are combined, could result in one or more "no data" fields that need to be removed before the output step. 
    • I used a window recipe to remove duplicates before splitting the data into the categories. 
    • After the split into groups, I did a sort recipe to compute the row count.
    • I then used another prepare recipe formula to filter out rows where Full_Name == "No Data" && Row_Count != 1

Setup Info
    Tags
      Help me…