我已经成功地找到了在Python中使用kmeans算法所需的最佳聚类数量,但现在我想知道如何在应用Kmeans后找到确切的聚类大小?
这是一个代码片段
data=np.vstack(zip(simpleassetid_arr,simpleuidarr))centroids,_ = kmeans(data,round(math.sqrt(len(uidarr)/2)))idx,_ = vq(data,centroids)initial = [cluster.vq.kmeans(data,i) for i in range(1,10)]var=[var for (cent,var) in initial] #to determine the optimal number of k using elbow testnum_k=int(raw_input("Enter the number of clusters: "))cent, var = initial[num_k-1]assignment,cdist = cluster.vq.vq(data,cent)
回答:
你可以使用以下方法获取聚类大小:
print np.bincount(idx)
对于下面的示例,np.bincount(idx)
输出一个包含两个元素的数组,例如 [ 156 144]
from numpy import vstack,arrayimport numpy as npfrom numpy.random import randfrom scipy.cluster.vq import kmeans,vq# data generationdata = vstack((rand(150,2) + array([.5,.5]),rand(150,2)))# computing K-Means with K = 2 (2 clusters)centroids,_ = kmeans(data,2)# assign each sample to a clusteridx,_ = vq(data,centroids)#Print number of elements per clusterprint np.bincount(idx)