How to remove rows with all emptly values across all columns
Pretty sure this is a basic question, however I was not able to find a solution so far.
After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.
I tried the following processor ("filter rows/cells on value"):
However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.
The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.
Am I missing something? Thanks!
Best Answer
-
Hello,
In the meantime, you can use the following trick:
1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code
def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True
Hope it helps,
Alex
Answers
-
Hi,
There are currently some known issues with this processor in "ALL" mode. You'll need to add the columns explicitly for the moment. -
Thanks for the quick reply.
Unfortunately, nothing happens when adding all the columns explicitly (i.e. no rows are removed). Even when selecting only one column, no rows are removed (agian, counter-check with "Action: only keep matching rows" removes all rows). -
I'm still seeing the same behaviour unfortunately, however for future readers a simpler option is described here using filter by formula:
https://community.dataiku.com/t5/Using-Dataiku-DSS/Remove-row-if-two-column-are-both-empty/m-p/1285
-
Hi,
I did the following steps :
- Concat all columns in a new one (named concat_all)
- Delete rows where concat_all is empty
hope this help