fillter column using NaN

phuongphan · June 2020

Hi,

Is there any function in processor library which can help to filter and remove columns having more than 85% of NaN?

Imagine when I have a dataset which has several thousands of columns and most of them have a lot of NaN, how can I remove those columns automatically in recipes?

Thank you very much!

ATsao · June 2020

Hi,

You can filter out rows in DSS (either removing or clearing them) where these columns contain NaN or null values. However, if you are looking to remove the column itself based on this condition, your best bet would be to create your own code recipe to handle this logic accordingly. For example, you could read the input dataset into a dataframe, whether through Python or R, iterate through the columns to calculate the % of NaNs that can be found in that particular column, and then remove the corresponding column(s) if it exceeds this condition (and write the resulting dataframe into your output dataset).

I hope that this helps!

Best,
Andrew

Liev · June 2020

Hi @phuongphan

In your prepare recipe, you can switch to "Columns View" (top right)

From this view you should be able to select to view the % of empty or non-empty records (screenshot attached). Then on the left side you should be able to select several columns at a time and delete in one go.

I hope this helps!

phuongphan · June 2020

Thank you @ATsao
for your reply. Yes, I finally had to create code recipes to remove all columns which have high percents of NaN, and to create new additional features.

phuongphan · June 2020

Hi @Liev
, very nice way. Thank you for your screenshot. By the way, if i have thousands of columns, it is still very hard to deal all of them at once. So I had to create code recipes.

fillter column using NaN

Best Answer

Answers

Categories

Setup Info

Tags