Running kmeans model in Python recipe is very slow

Alex_zw · January 2020

I was using python to build the cluster model in Dataiku

# Best k to build model
print('The best K sugest: ',K_best)
model = KMeans(n_clusters=K_best, init='k-means++', n_init=10,max_iter=300, tol=1e-04, random_state=101)

#
model = model.fit(X_scaled)

#
labels = model.labels_

# plt
#plt.scatter(X_scaled[:,0], X_scaled[:,1], c=model.labels_.astype(float))
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(121)
plt.scatter(x = X_scaled[:,1], y = X_scaled[:,0],c=model.labels_.astype(float))
ax.set_xlabel(feature_vector[1])
ax.set_ylabel(feature_vector[0])
ax = fig.add_subplot(122)
plt.scatter(x = X_scaled[:,2], y = X_scaled[:,0], c=model.labels_.astype(float))
ax.set_xlabel(feature_vector[2])
ax.set_ylabel(feature_vector[0])

plt.show()

but it takes more then 5 hours to run this part. May I know if there is a more efficient way to run that faster?

Many thanks

cperdigou · January 2020

I'm sorry I can't help with your exact problem as it would require knowing a bit more about your input data, its size in particular.

Could you try using Dataiku's visual machine learning for this task?

From the flow, select your input dataset > Lab > Quick Model > Clustering > Quick Models > K-means.

In DESIGN > Algorithms you can select other algorithms, and by clicking k-means you can try multiple cluster numbers. When you're done, click train, and tell us how this goes.

Running kmeans model in Python recipe is very slow

Answers

Categories

Setup Info

Tags