Public

# Way to count the number of columns that have values in a Row

Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
edited July 16

For each row in my dataset I'd like to know which row is most complete.

Does anyone know a way to count the number of columns for each row that have values. Maybe without enumerating each column. Right now I have a formula that looks for blank values. If the cell is no-blank it counts as one.

`if(isBlank(Constituency),0,1)+if(isBlank(Date),0,1)+if(isBlank(Amount),0,1)+if(isBlank(Category),0,1)+if(isBlank(val("\(No designation\) 2022")),0,1)+if(isBlank(val("\(No designation\) 2023")),0,1)+if(isBlank(val("\(No designation\) 2019")),0,1)+if(isBlank(val("\(No designation\) 2019")),0,1)`

Operating system used: Ubuntu 18.04 -> WSL2 -> Windows 11

Tagged:

• Dataiker, Dataiku DSS Core Designer, Registered Posts: 135 Dataiker

Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

```def process(row):
col=0
for val in row.values():
if val is not None and val!="":
col=col+1
return col```

• Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 145 Neuron
edited July 17

Hi Tom,

Im not sure if this can be done in a formula.
But one other option you probably can try is using a Python Function in the Prepare recipe
https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/python.html

You can keep the Mode as Cell
and in the source code something like below should work

```def process(row):
return sum(1 for i in row.values() if i is not None)```

• Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

I’ll check these out when I have a moment. Thanks.

This will be a great as part of a technique I’m exploring to try figure out which record to keep when I’m working on de-duping records. I want to keep the record with the richest set of additional data.

• Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

Your suggested solution seems to work fairly well for me.

Thanks.

I do wonder if @NN
solution might be more performant. However, my current dataset is fairly small at a few thousand records.

P.S. I hope others find this useful as well.

Help me…