I was using python to build the cluster model in Dataiku
# Best k to build model print('The best K sugest: ',K_best) model = KMeans(n_clusters=K_best, init='k-means++', n_init=10,max_iter=300, tol=1e-04, random_state=101) # model = model.fit(X_scaled) # labels = model.labels_ # plt #plt.scatter(X_scaled[:,0], X_scaled[:,1], c=model.labels_.astype(float)) fig = plt.figure(figsize=(20,5)) ax = fig.add_subplot(121) plt.scatter(x = X_scaled[:,1], y = X_scaled[:,0],c=model.labels_.astype(float)) ax.set_xlabel(feature_vector) ax.set_ylabel(feature_vector) ax = fig.add_subplot(122) plt.scatter(x = X_scaled[:,2], y = X_scaled[:,0], c=model.labels_.astype(float)) ax.set_xlabel(feature_vector) ax.set_ylabel(feature_vector) plt.show()
but it takes more then 5 hours to run this part. May I know if there is a more efficient way to run that faster?
I'm sorry I can't help with your exact problem as it would require knowing a bit more about your input data, its size in particular.
Could you try using Dataiku's visual machine learning for this task?
From the flow, select your input dataset > Lab > Quick Model > Clustering > Quick Models > K-means.
In DESIGN > Algorithms you can select other algorithms, and by clicking k-means you can try multiple cluster numbers. When you're done, click train, and tell us how this goes.