Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I was using python to build the cluster model in Dataiku
# Best k to build model
print('The best K sugest: ',K_best)
model = KMeans(n_clusters=K_best, init='k-means++', n_init=10,max_iter=300, tol=1e-04, random_state=101)
#
model = model.fit(X_scaled)
#
labels = model.labels_
# plt
#plt.scatter(X_scaled[:,0], X_scaled[:,1], c=model.labels_.astype(float))
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(121)
plt.scatter(x = X_scaled[:,1], y = X_scaled[:,0],c=model.labels_.astype(float))
ax.set_xlabel(feature_vector[1])
ax.set_ylabel(feature_vector[0])
ax = fig.add_subplot(122)
plt.scatter(x = X_scaled[:,2], y = X_scaled[:,0], c=model.labels_.astype(float))
ax.set_xlabel(feature_vector[2])
ax.set_ylabel(feature_vector[0])
plt.show()
but it takes more then 5 hours to run this part. May I know if there is a more efficient way to run that faster?
Many thanks
I'm sorry I can't help with your exact problem as it would require knowing a bit more about your input data, its size in particular.
Could you try using Dataiku's visual machine learning for this task?
From the flow, select your input dataset > Lab > Quick Model > Clustering > Quick Models > K-means.
In DESIGN > Algorithms you can select other algorithms, and by clicking k-means you can try multiple cluster numbers. When you're done, click train, and tell us how this goes.