Way to count the number of columns that have values in a Row
For each row in my dataset I'd like to know which row is most complete.
Does anyone know a way to count the number of columns for each row that have values. Maybe without enumerating each column. Right now I have a formula that looks for blank values. If the cell is no-blank it counts as one.
if(isBlank(Constituency),0,1)
+if(isBlank(Date),0,1)
+if(isBlank(Amount),0,1)
+if(isBlank(Category),0,1)
+if(isBlank(val("\(No designation\) 2022")),0,1)
+if(isBlank(val("\(No designation\) 2023")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)
Operating system used: Ubuntu 18.04 -> WSL2 -> Windows 11
Best Answer
-
Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:
def process(row): col=0 for val in row.values(): if val is not None and val!="": col=col+1 return col
Answers
-
Hi Tom,
Im not sure if this can be done in a formula.
But one other option you probably can try is using a Python Function in the Prepare recipe
https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/python.html
You can keep the Mode as Cell
and in the source code something like below should workdef process(row): return sum(1 for i in row.values() if i is not None)
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
I’ll check these out when I have a moment. Thanks.
This will be a great as part of a technique I’m exploring to try figure out which record to keep when I’m working on de-duping records. I want to keep the record with the richest set of additional data. -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Your suggested solution seems to work fairly well for me.
Thanks.
I do wonder if @NN
solution might be more performant. However, my current dataset is fairly small at a few thousand records.P.S. I hope others find this useful as well.