New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Clustering "ValueError: x and y arrays must have at least 2 entries"

Solved!
VeraM
Level 1
Level 1
Clustering "ValueError: x and y arrays must have at least 2 entries"

Hi everybody.

I try to cluster a very sparse data set. I have 750 columns with mostly 0s as values. Is there an option to drop columns automatically that contain only one 1?

Clustering is not possible like this and I would rather not look through all columns to see which of them have less than two 1s.

Thanks for your help!
Best,  Vera

0 Kudos
1 Solution
ATsao
Dataiker
Dataiker

Hi Vera,

Unfortunately, there is no way to do this automatically in DSS. If you don't wish to manually filter and check each column individually, you could always try creating your own code recipe, which can handle this for you. At a high level, you'd probably want to load the dataset into a dataframe, iterate/loop through all the columns to check this condition, and then remove the columns where this is the case in your output dataset. 

Thanks,

Andrew

View solution in original post

1 Reply
ATsao
Dataiker
Dataiker

Hi Vera,

Unfortunately, there is no way to do this automatically in DSS. If you don't wish to manually filter and check each column individually, you could always try creating your own code recipe, which can handle this for you. At a high level, you'd probably want to load the dataset into a dataframe, iterate/loop through all the columns to check this condition, and then remove the columns where this is the case in your output dataset. 

Thanks,

Andrew

View solution in original post

A banner prompting to get Dataiku DSS