我正在学习使用R语言的”kohonen”包来创建自组织映射(SOM,也称为Kohonen网络 – 一种机器学习算法)。我正在按照这个R语言教程进行学习:https://www.rpubs.com/loveb/som
我尝试创建自己的数据(这次包含“因子”和“数值”变量),并运行SOM算法(这次使用“supersom()”函数):
#加载库并调整颜色
library(kohonen) #拟合SOMs
library(ggplot2) #绘图
library(RColorBrewer) #颜色,使用预定义的调色板
contrast <- c("#FA4925", "#22693E", "#D4D40F", "#2C4382", "#F0F0F0", "#3D3D3D") #我自己的,对比色
cols <- brewer.pal(10, "Paired")
#create and format data
a =rnorm(1000,10,10)
b = rnorm(1000,10,5)
c = rnorm(1000,5,5)
d = rnorm(1000,5,10)
e <- sample( LETTERS[1:4], 100 , replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
f <- sample( LETTERS[1:5], 100 , replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2) )
g <- sample( LETTERS[1:2], 100 , replace=TRUE, prob=c(0.5, 0.5) )
data = data.frame(a,b,c,d,e,f,g)
data$e = as.factor(data$e)
data$f = as.factor(data$f)
data$g = as.factor(data$g)
cols <- 1:4
data[cols] <- scale(data[cols])
#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
从这里开始,我成功地制作了一些基本的图表:
#plots
#pretty gradient colors
colour1 <- tricolor(som$grid)
colour4 <- tricolor(som$grid, phi = c(pi/8, 6, -pi/6), offset = 0.1)
plot(som, type="changes")
plot(som, type="count")
plot(som, type="quality", shape = "straight")
plot(som, type="dist.neighbours", palette.name=grey.colors, shape = "straight")
然而,当我尝试为每个变量创建单独的图表时,问题就出现了:
#error
var <- 1 #define the variable to plot
plot(som, type = "property", property = getCodes(som)[,var], main=colnames(getCodes(som))[var], palette.name=terrain.colors)
var <- 6 #define the variable to plot
plot(som, type = "property", property = getCodes(som)[,var], main=colnames(getCodes(som))[var], palette.name=terrain.colors)
这会产生一个错误:"Error: Incorrect Number of Dimensions"
尝试对SOM网络进行聚类时,会产生一个类似的错误(NAs by coercion
):
#cluster (error)
set.seed(33) #for reproducability
fit_kmeans <- kmeans(data, 3) #3 clusters are used, as indicated by the wss development.
cl_assignmentk <- fit_kmeans$cluster[data$unit.classif]
par(mfrow=c(1,1))
plot(som, type="mapping", bg = rgb(colour4), shape = "straight", border = "grey",col=contrast)
add.cluster.boundaries(som, fit_kmeans$cluster, lwd = 3, lty = 2, col=contrast[4])
能有人告诉我我做错了什么吗?谢谢
来源:https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom
回答:
getCodes()
产生一个列表,因此你需要像处理列表一样处理它。
调用 getCodes(som)
会产生一个包含7个项目命名为a-g的列表,因此你应该从列表中选择项目,可以使用 $
或 [[]]
例如
plot(som, type = "property", property = getCodes(som)[[1]], main=names(getCodes(som))[1], palette.name=terrain.colors)
或者
plot(som, type = "property", property = getCodes(som)$a, main="a", palette.name=terrain.colors)
或者
plot(som, type = "property", property = getCodes(som)[["a"]], main="a", palette.name=terrain.colors)
如果你必须在调用绘图前设置变量,可以这样做:
var <- 1
plot(som, type = "property", property = getCodes(som)[[var]], main=names(getCodes(som))[var], palette.name=terrain.colors)
关于 kmeans()
kmeans()
需要一个矩阵或可以被转换为矩阵的对象,你有因子(分类数据),这些不能被转换为数值,要么删除因子,要么寻找其他方法。
删除因子:
#cluster (error)
set.seed(33) #for reproducability
fit_kmeans <- kmeans(as.matrix(data[1:4]), 3)#3 clusters are used, as indicated by the wss development.
cl_assignmentk <- fit_kmeans$cluster[data$unit.classif]
par(mfrow=c(1,1))
plot(som, type="mapping", bg = rgb(colour4), shape = "straight", border = "grey",col=contrast)
add.cluster.boundaries(som, fit_kmeans$cluster, lwd = 3, lty = 2, col=contrast[4])
编辑:或者你可以直接从 getCodes()
中指定代码,使用idx如下所示:
plot(som, type = "property", property = getCodes(som, idx = 1), main="a"), palette.name=terrain.colors)