Running kmeans model in Python recipe is very slow

Alex_zw · ‎01-15-2020

I was using python to build the cluster model in Dataiku

# Best k to build model
print('The best K sugest: ',K_best)
model = KMeans(n_clusters=K_best, init='k-means++', n_init=10,max_iter=300, tol=1e-04, random_state=101)

#
model = model.fit(X_scaled)

#
labels = model.labels_

# plt
#plt.scatter(X_scaled[:,0], X_scaled[:,1], c=model.labels_.astype(float))
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(121)
plt.scatter(x = X_scaled[:,1], y = X_scaled[:,0],c=model.labels_.astype(float))
ax.set_xlabel(feature_vector[1])
ax.set_ylabel(feature_vector[0])
ax = fig.add_subplot(122)
plt.scatter(x = X_scaled[:,2], y = X_scaled[:,0], c=model.labels_.astype(float))
ax.set_xlabel(feature_vector[2])
ax.set_ylabel(feature_vector[0])

plt.show()

but it takes more then 5 hours to run this part. May I know if there is a more efficient way to run that faster?

Many thanks

cperdigou · ‎01-15-2020

I'm sorry I can't help with your exact problem as it would require knowing a bit more about your input data, its size in particular.

Could you try using Dataiku's visual machine learning for this task?

From the flow, select your input dataset > Lab > Quick Model > Clustering > Quick Models > K-means.

In DESIGN > Algorithms you can select other algorithms, and by clicking k-means you can try multiple cluster numbers. When you're done, click train, and tell us how this goes.

Sign up to take part

Running kmeans model in Python recipe is very slow

Running kmeans model in Python recipe is very slow