How to remove rows with all emptly values across all columns

Partner, Registered Posts: 9 Partner

Pretty sure this is a basic question, however I was not able to find a solution so far.

After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.

I tried the following processor ("filter rows/cells on value"):

However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.

The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.

Am I missing something? Thanks!

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    edited July 2024 Answer ✓

    Hello,

    In the meantime, you can use the following trick:

    1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code


    def process(row):
    all_empty = all(
    [str(v).strip() == '' or v is None
    for k,v in row.items() if k != 'all_empty']
    )
    return(all_empty)

    2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True

    Hope it helps,

    Alex

Answers

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.