fillter column using NaN
Hi,
Is there any function in processor library which can help to filter and remove columns having more than 85% of NaN?
Imagine when I have a dataset which has several thousands of columns and most of them have a lot of NaN, how can I remove those columns automatically in recipes?
Thank you very much!
Best Answer
-
Hi,
You can filter out rows in DSS (either removing or clearing them) where these columns contain NaN or null values. However, if you are looking to remove the column itself based on this condition, your best bet would be to create your own code recipe to handle this logic accordingly. For example, you could read the input dataset into a dataframe, whether through Python or R, iterate through the columns to calculate the % of NaNs that can be found in that particular column, and then remove the corresponding column(s) if it exceeds this condition (and write the resulting dataframe into your output dataset).
I hope that this helps!
Best,
Andrew
Answers
-
Hi @phuongphan
In your prepare recipe, you can switch to "Columns View" (top right)
From this view you should be able to select to view the % of empty or non-empty records (screenshot attached). Then on the left side you should be able to select several columns at a time and delete in one go.
I hope this helps!
-
Thank you @ATsao
for your reply. Yes, I finally had to create code recipes to remove all columns which have high percents of NaN, and to create new additional features. -
Hi @Liev
, very nice way. Thank you for your screenshot. By the way, if i have thousands of columns, it is still very hard to deal all of them at once. So I had to create code recipes.