Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on February 19, 2019 11:27PM
Likes: 0
Replies: 5
Pretty sure this is a basic question, however I was not able to find a solution so far.
After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.
I tried the following processor ("filter rows/cells on value"):
However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.
The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.
Am I missing something? Thanks!
Hello,
In the meantime, you can use the following trick:
1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code
def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)
2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True
Hope it helps,
Alex
I'm still seeing the same behaviour unfortunately, however for future readers a simpler option is described here using filter by formula:
https://community.dataiku.com/t5/Using-Dataiku-DSS/Remove-row-if-two-column-are-both-empty/m-p/1285
Hi,
I did the following steps :
- Concat all columns in a new one (named concat_all)
- Delete rows where concat_all is empty
hope this help