Join us on August 6th for Leveraging Marketing Data in the Sports and Entertainment World Learn more

How to remove rows with all emptly values across all columns

Level 2
How to remove rows with all emptly values across all columns

Pretty sure this is a basic question, however I was not able to find a solution so far.



After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.



I tried the following processor ("filter rows/cells on value"):



However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.



The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.



Am I missing something? Thanks!

0 Kudos
4 Replies
Dataiker
Dataiker
Hi,

There are currently some known issues with this processor in "ALL" mode. You'll need to add the columns explicitly for the moment.
Level 2
Author
Thanks for the quick reply.
Unfortunately, nothing happens when adding all the columns explicitly (i.e. no rows are removed). Even when selecting only one column, no rows are removed (agian, counter-check with "Action: only keep matching rows" removes all rows).
0 Kudos
Level 2

I'm still seeing the same behaviour unfortunately, however for future readers a simpler option is described here using filter by formula:

https://community.dataiku.com/t5/Using-Dataiku-DSS/Remove-row-if-two-column-are-both-empty/m-p/1285

0 Kudos
Dataiker
Dataiker

Hello,



In the meantime, you can use the following trick:



1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code




def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)


2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True





Hope it helps,



Alex

0 Kudos
Labels (2)