我在学习K-means聚类。对plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
的工作原理感到非常困惑,代码中的X[y_kmeans == 0, 0], X[y_kmeans == 0, 1]
的目的是什么?
完整代码如下
#k-means#importing librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pd#importing the datasetdataset = pd.read_csv("mall_customers.csv")X = dataset.iloc[:,[3,4]].values#using the elbow method to find the optimal number of clustersfrom sklearn.cluster import KMeanswcss = [] #Within-Cluster Sum of Squarefor i in range(1,11): kmeans = KMeans(n_clusters = i, init = 'k-means++',max_iter = 300,n_init=10,random_state = 0) kmeans.fit(X) wcss.append(kmeans.inertia_)plt.plot(range(1,11),wcss)plt.title("The elbow method")plt.xlabel("Number of cluster")plt.ylabel('Wcss') plt.show() #applying kmeans to all datasetkmeans = KMeans(n_clusters = 5,init = 'k-means++', max_iter=300,n_init=10,random_state=0)y_kmeans = kmeans.fit_predict(X)#Visualising the clusterplt.scatter(X[y_kmeans == 0,0],X[y_kmeans == 0,1],s=100,c = 'red' ,label='Cluster1')plt.scatter(X[y_kmeans == 1,0],X[y_kmeans == 1,1],s=100,c='blue', label='Cluster2')plt.scatter(X[y_kmeans == 2,0],X[y_kmeans == 2,1],s=100,c='green',label='Cluster3')plt.scatter(X[y_kmeans == 3,0],X[y_kmeans == 3,1],s=100, c ='cyan',label = 'CLuster4')plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=300, c = 'yellow', label ='Centroids')plt.title('Clusters of customers')plt.xlabel('Annual Income (k$)')plt.ylabel('Spending Score (1-100)')plt.legend()plt.show()
回答:
这是一个过滤器。y_kmeans == 0
选择那些y_kmeans[i]
等于0的元素。X[y_kmeans == 0, 0]
选择X中对应的y_kmeans
值为0且第二维度为0的元素。
最初由@***回答
X[y_hc ==1,0]
这里的0表示模型在x平面,X[y_hc == 0,1]
表示模型在y平面。而1指的是[i]
的值或聚类值。