How to remove rows with all emptly values across all columns

Highlighted
bkmyt
Level 2
How to remove rows with all emptly values across all columns
Jump to solution

Pretty sure this is a basic question, however I was not able to find a solution so far.



After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.



I tried the following processor ("filter rows/cells on value"):



However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.



The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.



Am I missing something? Thanks!

0 Kudos
1 Solution

Accepted Solutions
Alex_Combessie Dataiker
Dataiker
Re: How to remove rows with all emptly values across all columns
Jump to solution

Hello,



In the meantime, you can use the following trick:



1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code




def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)


2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True





Hope it helps,



Alex

View solution in original post

0 Kudos
3 Replies
Clément_Stenac Dataiker
Dataiker
Re: How to remove rows with all emptly values across all columns
Jump to solution
Hi,

There are currently some known issues with this processor in "ALL" mode. You'll need to add the columns explicitly for the moment.
0 Kudos
bkmyt
Level 2
Re: How to remove rows with all emptly values across all columns
Jump to solution
Thanks for the quick reply.
Unfortunately, nothing happens when adding all the columns explicitly (i.e. no rows are removed). Even when selecting only one column, no rows are removed (agian, counter-check with "Action: only keep matching rows" removes all rows).
0 Kudos
Alex_Combessie Dataiker
Dataiker
Re: How to remove rows with all emptly values across all columns
Jump to solution

Hello,



In the meantime, you can use the following trick:



1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code




def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)


2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True





Hope it helps,



Alex

View solution in original post

0 Kudos
Labels (2)