大家好!我被要求在R语言中创建一个K-means算法,但我对这个语言并不熟悉,所以我在网上找到了一些示例代码,并决定使用。我研究了这些代码,学习了其中使用的函数,并对其进行了一些修改,因为它最初运行得不太好。以下是代码:
# Creating a sample of datay=rnorm(500,1.65)x=rnorm(500,1.15)x=cbind(x,y)centers <- x[sample(nrow(x),5),]# A function for calculating the distance between centers and the rest of the dotseuclid <- function(points1, points2) { distanceMatrix <- matrix(NA, nrow=dim(points1)[1], ncol=dim(points2)[1]) for(i in 1:nrow(points2)) { distanceMatrix[,i] <- sqrt(rowSums(t(t(points1)-points2[i,])^2)) } distanceMatrix}# A method functionK_means <- function(x, centers, euclid, nItter) { clusterHistory <- vector(nItter, mode="list") centerHistory <- vector(nItter, mode="list") for(i in 1:nItter) { distsToCenters <- euclid(x, centers) clusters <- apply(distsToCenters, 1, which.min) centers <- apply(x, 2, tapply, clusters, mean) # Saving history clusterHistory[[i]] <- clusters centerHistory[[i]] <- centers } structure(list(clusters = clusterHistory, centers = centerHistory))}res <- K_means(x, centers, euclid, 5)#To use the same plot operations I had to use unlist, since the resulting object in my function is a list of lists,#and default object is just a list. And also i store the history of each iteration in that object.res <- unlist(res, recursive = FALSE)plot(x, col = res$clusters5)points(res$centers5, col = 1:5, pch = 8, cex = 2)
这个代码在简单的矩阵上运行得很好。但我被要求将其用于iris数据集:
head(iris)a <-data.frame(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)centers <- a[sample(nrow(a),3),]iris_clusters <- K_means(a, centers, euclid, 3)iris_clusters <- unlist(iris_clusters, recursive = FALSE)head(iris_clusters)
问题是它无法正常工作。错误信息如下:
Error in distanceMatrix[, i] <- sqrt(rowSums(t(t(points1) - points2[i, : number of items to replace is not a multiple of replacement length
我明白这是因为对象的维度不匹配,但我不知道为什么会这样。所以我来寻求帮助。提前为代码中可能存在的任何愚蠢之处道歉,但我对这个语言还不太熟悉,所以请不要太苛刻。谢谢!
回答:
你的实现可以通过简单的类型转换来解决问题
iris_clusters <- K_means(as.matrix(a), as.matrix(centers), euclid, 3) # 3 iterationsiris_clusters <- unlist(iris_clusters, recursive = FALSE)# plotting the clusters obtained on the first two dimensions at the end of 3rd iterationplot(a[,1:2], col = iris_clusters$clusters3, pch=19) points(iris_clusters$centers3, col = 1:5, pch = 8, cex = 2)
head(iris_clusters)# cluster assignments and centroids computed at different iterations$clusters1 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 3 2 3 2 3 2 3 3 3 3 2 3 3 3 3 3 3 2 3 2 2 3 3 [77] 2 2 3 3 3 3 3 2 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 1 2 1 2 1 1 3 1 1 1 2 2 2 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 2 2 2 1 2 2 1 1 2 2 2 2 2$clusters2 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 3 3 2 2 2 3 2 2 2 2 3 2 2 2 2 2 2 [77] 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 2 2 1 1 2 2 1 1 2 2 2 2 2$clusters3 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [77] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 1 2 2 2 2$centers1 iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width1 7.150000 3.120000 6.090000 2.13500002 6.315909 2.915909 5.059091 1.80000003 5.297674 3.115116 2.550000 0.6744186$centers2 iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width1 7.122727 3.113636 6.031818 2.13181822 6.123529 2.852941 4.741176 1.61323533 5.056667 3.268333 1.810000 0.3883333$centers3 iris.Sepal.Length iris.Sepal.Width iris.Petal.Length iris.Petal.Width1 7.014815 3.096296 5.918519 2.1555562 6.025714 2.805714 4.588571 1.5185713 5.005660 3.369811 1.560377 0.290566