Try your hand at analyzing royal sentiment in Dataiku DSS! Learn more

Interpreting cluster results

Level 3
Interpreting cluster results

Hi all,

I am testing our clustering on a data table of customer records. As a starting point I tried Interactive Clustering on a dataset of 6.5 million records, with approx. 10 columns containing stats on a records behaviour.

I am slightly puzzled by the results, shown below:

ben_p_0-1587111807372.png

I have almost all my customers in a single cluster, with only a handful falling into other clusters - why would clustering yield such results and how might I go about making the clusters more useful?

Ben

4 Replies
Dataiker
Dataiker

Hi Ben, 

In interactive clustering, we first run a K-mean algorithm.

K-mean is sensitive to outliers and noise. So in your case, you end with all the observations in the same cluster and 4 clusters of outliers. 

To have better results you can try to use in Outliers Detection in the Design part: Create a cluster with outliers. 
You'll have only one cluster with outliers.

Mattsco
Level 3
Author

Thanks Matt,

I ran a simpler k-means on the data and got much more balanced segments - I still don't understand how this two-step clustering provides extra insight into the clusters and allows them to be explored after clustering, can you explain this?

Ben

Dataiker
Dataiker

Interactive clustering is a 2 steps process. 
First you train a K-mean, then you can modify yourself the clustering, merging 2 clusters together for example. 

 

If you are interested to do your own grouping of data, you can check also the interactive decision tree builder: 
https://www.dataiku.com/product/plugins/interactive-decision-tree-builder/

Mattsco
Level 3
Author

Thanks again Matt, when you say "modify the clustering", this has to be done manually, right?

Apologies if this is a dumb question!

0 Kudos