Is there any function in processor library which can help to filter and remove columns having more than 85% of NaN?
Imagine when I have a dataset which has several thousands of columns and most of them have a lot of NaN, how can I remove those columns automatically in recipes?
Thank you very much!
In your prepare recipe, you can switch to "Columns View" (top right)
From this view you should be able to select to view the % of empty or non-empty records (screenshot attached). Then on the left side you should be able to select several columns at a time and delete in one go.
I hope this helps!
You can filter out rows in DSS (either removing or clearing them) where these columns contain NaN or null values. However, if you are looking to remove the column itself based on this condition, your best bet would be to create your own code recipe to handle this logic accordingly. For example, you could read the input dataset into a dataframe, whether through Python or R, iterate through the columns to calculate the % of NaNs that can be found in that particular column, and then remove the corresponding column(s) if it exceeds this condition (and write the resulting dataframe into your output dataset).
I hope that this helps!