Way to count the number of columns that have values in a Row

Options
tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
edited July 16 in Using Dataiku

For each row in my dataset I'd like to know which row is most complete.

Does anyone know a way to count the number of columns for each row that have values. Maybe without enumerating each column. Right now I have a formula that looks for blank values. If the cell is no-blank it counts as one.

if(isBlank(Constituency),0,1)
+if(isBlank(Date),0,1)
+if(isBlank(Amount),0,1)
+if(isBlank(Category),0,1)
+if(isBlank(val("\(No designation\) 2022")),0,1)
+if(isBlank(val("\(No designation\) 2023")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)
+if(isBlank(val("\(No designation\) 2019")),0,1)

Operating system used: Ubuntu 18.04 -> WSL2 -> Windows 11

Tagged:

Best Answer

  • Catalina
    Catalina Dataiker, Dataiku DSS Core Designer, Registered Posts: 135 Dataiker
    edited July 17 Answer ✓
    Options

    Another way to achieve this is using below custom Python function that includes the check if there are any empty strings:

    def process(row):
        col=0
        for val in row.values():
            if val is not None and val!="":
                col=col+1
        return col

Answers

  • NN
    NN Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 145 Neuron
    edited July 17
    Options

    Hi Tom,

    Im not sure if this can be done in a formula.
    But one other option you probably can try is using a Python Function in the Prepare recipe
    https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/python.html

    You can keep the Mode as Cell
    and in the source code something like below should work

    def process(row):
        return sum(1 for i in row.values() if i is not None)

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @NN
    , @CatalinaS

    I’ll check these out when I have a moment. Thanks.

    This will be a great as part of a technique I’m exploring to try figure out which record to keep when I’m working on de-duping records. I want to keep the record with the richest set of additional data.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @CatalinaS

    Your suggested solution seems to work fairly well for me.

    Thanks.

    I do wonder if @NN
    solution might be more performant. However, my current dataset is fairly small at a few thousand records.

    P.S. I hope others find this useful as well.

Setup Info
    Tags
      Help me…