K-means算法，R – 学技术

大家好！我被要求在R语言中创建一个K-means算法，但我对这个语言并不熟悉，所以我在网上找到了一些示例代码，并决定使用。我研究了这些代码，学习了其中使用的函数，并对其进行了一些修改，因为它最初运行得不太好。以下是代码：

# Creating a sample of datay=rnorm(500,1.65)x=rnorm(500,1.15)x=cbind(x,y)centers <- x[sample(nrow(x),5),]# A function for calculating the distance between centers and the rest of the dotseuclid <- function(points1, points2) {  distanceMatrix <- matrix(NA, nrow=dim(points1)[1], ncol=dim(points2)[1])  for(i in 1:nrow(points2)) {    distanceMatrix[,i] <- sqrt(rowSums(t(t(points1)-points2[i,])^2))  }  distanceMatrix}# A method functionK_means <- function(x, centers, euclid, nItter) {  clusterHistory <- vector(nItter, mode="list")  centerHistory <- vector(nItter, mode="list")  for(i in 1:nItter) {    distsToCenters <- euclid(x, centers)    clusters <- apply(distsToCenters, 1, which.min)    centers <- apply(x, 2, tapply, clusters, mean)    # Saving history    clusterHistory[[i]] <- clusters    centerHistory[[i]] <- centers  }  structure(list(clusters = clusterHistory, centers = centerHistory))}res <- K_means(x, centers, euclid, 5)#To use the same plot operations I had to use unlist, since the resulting object in my function is a list of lists,#and default object is just a list. And also i store the history of each iteration in that object.res <- unlist(res, recursive = FALSE)plot(x, col = res$clusters5)points(res$centers5, col = 1:5, pch = 8, cex = 2)

这个代码在简单的矩阵上运行得很好。但我被要求将其用于iris数据集：

head(iris)a <-data.frame(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)centers <- a[sample(nrow(a),3),]iris_clusters <- K_means(a, centers, euclid, 3)iris_clusters <- unlist(iris_clusters, recursive = FALSE)head(iris_clusters)

问题是它无法正常工作。错误信息如下：

Error in distanceMatrix[, i] <- sqrt(rowSums(t(t(points1) - points2[i,  :   number of items to replace is not a multiple of replacement length

我明白这是因为对象的维度不匹配，但我不知道为什么会这样。所以我来寻求帮助。提前为代码中可能存在的任何愚蠢之处道歉，但我对这个语言还不太熟悉，所以请不要太苛刻。谢谢！

回答：

你的实现可以通过简单的类型转换来解决问题

iris_clusters <- K_means(as.matrix(a), as.matrix(centers), euclid, 3) # 3 iterationsiris_clusters <- unlist(iris_clusters, recursive = FALSE)# plotting the clusters obtained on the first two dimensions at the end of 3rd iterationplot(a[,1:2], col = iris_clusters$clusters3, pch=19) points(iris_clusters$centers3, col = 1:5, pch = 8, cex = 2)

head(iris_clusters)# cluster assignments and centroids computed at different iterations$clusters1  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 3 2 3 2 3 2 3 3 3 3 2 3 3 3 3 3 3 2 3 2 2 3 3 [77] 2 2 3 3 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 1 2 1 2 1 1 3 1 1 1 2 2 2 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 2 2 2 1 2 2 1 1 2 2 2 2 2$clusters2  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 3 3 2 2 2 3 2 2 2 2 3 2 2 2 2 2 2 [77] 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 2$clusters3  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [77] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 1 2 2 2 2$centers1  iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width1          7.150000         3.120000          6.090000        2.13500002          6.315909         2.915909          5.059091        1.80000003          5.297674         3.115116          2.550000        0.6744186$centers2  iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width1          7.122727         3.113636          6.031818        2.13181822          6.123529         2.852941          4.741176        1.61323533          5.056667         3.268333          1.810000        0.3883333$centers3  iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width1          7.014815         3.096296          5.918519         2.1555562          6.025714         2.805714          4.588571         1.5185713          5.005660         3.369811          1.560377         0.290566

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复