Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

Way to count the number of columns that have values in a Row

Solved!
tgb417
Way to count the number of columns that have values in a Row

For each row in my dataset I'd like to know which row is most complete.

Does anyone know a way to count the number of columns for each row that have values.  Maybe without enumerating each column.  Right now I have a formula that  looks for blank values.  If the cell is no-blank it counts as one.

if(isBlank(Constituency),0,1)
+if(isBlank(Date),0,1)
+if(isBlank(Amount),0,1)
+if(isBlank(Category),0,1)
+if(isBlank(val("\(No designation\) 2022")),0,1)
+if(isBlank(val("\(No designation\) 2023")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)

Operating system used: Ubuntu 18.04 -> WSL2 -> Windows 11

 

--Tom
0 Kudos
1 Solution
CatalinaS
Dataiker

 

Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

def process(row):
    col=0
    for val in row.values():
        if val is not None and val!="":
            col=col+1
    return col

 

View solution in original post

4 Replies
NN

Hi Tom,

Im not sure if this can be done in a formula.
But one other option you probably can try is using a Python Function in the Prepare recipe
https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/python.html

You can keep the Mode as Cell
and in the source code something like below should work

def process(row):
    return sum(1 for i in row.values() if i is not None)

 

CatalinaS
Dataiker

 

Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

def process(row):
    col=0
    for val in row.values():
        if val is not None and val!="":
            col=col+1
    return col

 

tgb417
Author

@NN , @CatalinaS 

I’ll check these out when I have a moment. Thanks. 

This will be a great as part of a technique I’m exploring to try figure out which record to keep when I’m working on de-duping records.  I want to keep the record with the richest set of additional data.  

--Tom
0 Kudos
tgb417
Author

@CatalinaS 

Your suggested solution seems to work fairly well for me.

Thanks.

I do wonder if @NN solution might be more performant.  However, my current dataset is fairly small at a few thousand records.

P.S. I hope others find this useful as well.

--Tom
0 Kudos