Clustering "ValueError: x and y arrays must have at least 2 entries"

VeraM
VeraM Partner, Registered Posts: 1 Partner

Hi everybody.

I try to cluster a very sparse data set. I have 750 columns with mostly 0s as values. Is there an option to drop columns automatically that contain only one 1?

Clustering is not possible like this and I would rather not look through all columns to see which of them have less than two 1s.

Thanks for your help!
Best, Vera

Best Answer

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Answer ✓

    Hi Vera,

    Unfortunately, there is no way to do this automatically in DSS. If you don't wish to manually filter and check each column individually, you could always try creating your own code recipe, which can handle this for you. At a high level, you'd probably want to load the dataset into a dataframe, iterate/loop through all the columns to check this condition, and then remove the columns where this is the case in your output dataset.

    Thanks,

    Andrew

Setup Info
    Tags
      Help me…