How to remove rows with all emptly values across all columns

bkmyt
bkmyt Partner, Registered Posts: 9 Partner

Pretty sure this is a basic question, however I was not able to find a solution so far.

After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.

I tried the following processor ("filter rows/cells on value"):

However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.

The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.

Am I missing something? Thanks!

Best Answer

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    edited July 18 Answer ✓

    Hello,

    In the meantime, you can use the following trick:

    1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code


    def process(row):
    all_empty = all(
    [str(v).strip() == '' or v is None
    for k,v in row.items() if k != 'all_empty']
    )
    return(all_empty)

    2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True

    Hope it helps,

    Alex

Answers

Setup Info
    Tags
      Help me…