Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards!

# Way to count the number of columns that have values in a Row

Solved!
###### Way to count the number of columns that have values in a Row

For each row in my dataset I'd like to know which row is most complete.

Does anyone know a way to count the number of columns for each row that have values.  Maybe without enumerating each column.  Right now I have a formula that  looks for blank values.  If the cell is no-blank it counts as one.

`if(isBlank(Constituency),0,1)+if(isBlank(Date),0,1)+if(isBlank(Amount),0,1)+if(isBlank(Category),0,1)+if(isBlank(val("\(No designation\) 2022")),0,1)+if(isBlank(val("\(No designation\) 2023")),0,1)+if(isBlank(val("\(No designation\) 2019")),0,1)+if(isBlank(val("\(No designation\) 2019")),0,1)`

Operating system used: Ubuntu 18.04 -> WSL2 -> Windows 11

--Tom
1 Solution
Dataiker

Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

``````def process(row):
col=0
for val in row.values():
if val is not None and val!="":
col=col+1
return col``````

4 Replies

Hi Tom,

Im not sure if this can be done in a formula.
But one other option you probably can try is using a Python Function in the Prepare recipe
https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/python.html

You can keep the Mode as Cell
and in the source code something like below should work

``````def process(row):
return sum(1 for i in row.values() if i is not None)``````

Dataiker

Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

``````def process(row):
col=0
for val in row.values():
if val is not None and val!="":
col=col+1
return col``````

Author

Iโll check these out when I have a moment. Thanks.

This will be a great as part of a technique Iโm exploring to try figure out which record to keep when Iโm working on de-duping records.  I want to keep the record with the richest set of additional data.

--Tom
Author

Your suggested solution seems to work fairly well for me.

Thanks.

I do wonder if @NN solution might be more performant.  However, my current dataset is fairly small at a few thousand records.

P.S. I hope others find this useful as well.

--Tom