Way to count the number of columns that have values in a Row

tgb417 · September 2022

For each row in my dataset I'd like to know which row is most complete.

Does anyone know a way to count the number of columns for each row that have values. Maybe without enumerating each column. Right now I have a formula that looks for blank values. If the cell is no-blank it counts as one.

if(isBlank(Constituency),0,1)
+if(isBlank(Date),0,1)
+if(isBlank(Amount),0,1)
+if(isBlank(Category),0,1)
+if(isBlank(val("\(No designation\) 2022")),0,1)
+if(isBlank(val("\(No designation\) 2023")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)

Operating system used: Ubuntu 18.04 -> WSL2 -> Windows 11

Catalina · September 2022

Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

def process(row):
    col=0
    for val in row.values():
        if val is not None and val!="":
            col=col+1
    return col

NN · September 2022

Hi Tom,

Im not sure if this can be done in a formula.
But one other option you probably can try is using a Python Function in the Prepare recipe
https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/python.html

You can keep the Mode as Cell
and in the source code something like below should work

def process(row):
    return sum(1 for i in row.values() if i is not None)

tgb417 · September 2022

@NN
, @CatalinaS

I’ll check these out when I have a moment. Thanks.

This will be a great as part of a technique I’m exploring to try figure out which record to keep when I’m working on de-duping records. I want to keep the record with the richest set of additional data.

tgb417 · September 2022

@CatalinaS

Your suggested solution seems to work fairly well for me.

Thanks.

I do wonder if @NN
solution might be more performant. However, my current dataset is fairly small at a few thousand records.

P.S. I hope others find this useful as well.

Way to count the number of columns that have values in a Row

Best Answer

Answers

Categories

Setup Info

Tags