fillter column using NaN

phuongphan Registered Posts: 9 ✭✭✭✭


Is there any function in processor library which can help to filter and remove columns having more than 85% of NaN?

Imagine when I have a dataset which has several thousands of columns and most of them have a lot of NaN, how can I remove those columns automatically in recipes?

Thank you very much!

Best Answer

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Answer ✓


    You can filter out rows in DSS (either removing or clearing them) where these columns contain NaN or null values. However, if you are looking to remove the column itself based on this condition, your best bet would be to create your own code recipe to handle this logic accordingly. For example, you could read the input dataset into a dataframe, whether through Python or R, iterate through the columns to calculate the % of NaNs that can be found in that particular column, and then remove the corresponding column(s) if it exceeds this condition (and write the resulting dataframe into your output dataset).

    I hope that this helps!



Setup Info
      Help me…