I was using python to build the cluster model in Dataiku
# Best k to build model
print('The best K sugest: ',K_best)
model = KMeans(n_clusters=K_best, init='k-means++', n_init=10,max_iter=300, tol=1e-04, random_state=101)
model = model.fit(X_scaled)
labels = model.labels_
#plt.scatter(X_scaled[:,0], X_scaled[:,1], c=model.labels_.astype(float))
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(121)
plt.scatter(x = X_scaled[:,1], y = X_scaled[:,0],c=model.labels_.astype(float))
ax = fig.add_subplot(122)
plt.scatter(x = X_scaled[:,2], y = X_scaled[:,0], c=model.labels_.astype(float))
but it takes more then 5 hours to run this part. May I know if there is a more efficient way to run that faster?
I'm sorry I can't help with your exact problem as it would require knowing a bit more about your input data, its size in particular.
Could you try using Dataiku's visual machine learning for this task?
From the flow, select your input dataset > Lab > Quick Model > Clustering > Quick Models > K-means.
In DESIGN > Algorithms you can select other algorithms, and by clicking k-means you can try multiple cluster numbers. When you're done, click train, and tell us how this goes.