How to build a 'Zero Records Check'
In Alteryx I was able to build a zero records check that made it possible for me to have a "no data" entry for a particular path in my workflow. This is used in reporting to show that there were no items in that particular section.
- Count Records
- Append the record count (could include 0)
- Replace Nulls with 0 or Blank
- Formula checks for an empty field and replaces it with "No Data"
Is there a way to do this in Dataiku?
Operating system used: Microsoft Windows 11
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
In dataiku you can set Metrics to count records and Data Quality rules to take an action if the rwo count is not within expected values:
https://knowledge.dataiku.com/latest/mlops-o16n/data-quality/tutorial-data-quality.html
These can be shown in dashboards or scenarios or you can have conditional logic with them. I am unsure as what is your real requirement here. What's the point of having a field saying "no data". Makes no sense to me. Use a
-
I investigated the Metrics option and it will not work for me. The reason I need to create a row with a "no data" field is for reporting purposes. My workflow joins data and, based on particular fields, groups it by a numerical condition that requires a particular action. The flow outputs both a spreadsheet as well as an email with summary sections containing a table for each record that is in that section. It is better (in my opinion) to have a row in the table that reads "no data" if there aren't any records instead of it being completely blank.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
There are many ways to do what you want but pretty much all of them will start with a metric as that what you can use to determine your flow has no data and you execute a dummy flow zone that sends a dummy output. Have a look at this conditional execution of scenario steps post I made.
-
I think that I figured it all out, but it seems like so many extra steps and . Here's how I did it.
I have numerical conditions (0-9) that get grouped into 4 categories.
- I joined an editable dataset that contains 10 rows (integers 0-9) to my data using a Right Join.
- This created a row for each of the conditions even if there wasn't a corresponding condition value in the main dataset.
- I used a prepare recipe formula to check for undefined values in a field (Full Name) and if true, populated them with "No Data".
- This, when the categories are combined, could result in one or more "no data" fields that need to be removed before the output step.
- I used a window recipe to remove duplicates before splitting the data into the categories.
- After the split into groups, I did a sort recipe to compute the row count.
- I then used another prepare recipe formula to filter out rows where Full_Name == "No Data" && Row_Count != 1
- I joined an editable dataset that contains 10 rows (integers 0-9) to my data using a Right Join.