我运行了rpart手册中的示例
tree <- rpart(Species~., data = iris)plot(tree,margin=0.1)text(tree)
现在我想对另一个数据集进行修改
digitstrainURL <- "http://archive.ics.uci.edu/ml/machine-learning-databases/pendigits/pendigits.tra"digitsTestURL <- "http://archive.ics.uci.edu/ml/machine-learning-databases/pendigits/pendigits.tes"digitstrain <- read.table(digitstrainURL, sep=",", col.names=c("i1","i2","i3","i4","i5","i6","i7","i8","i9","i10","i11","i12","i13","i14","i15","i16", "Class"))digitstest <- read.table(digitsTestURL, sep=",",col.names=c("i1","i2","i3","i4","i5","i6","i7","i8","i9","i10","i11","i12","i13","i14","i15","i16", "Class"))tree <- rpart(Class~., data = digitstrain)plot(tree,margin=0.1)text(tree)
该数据集包含手写数字的数据,”Class”列保存了数字0-9。但是当我绘制决策树时,得到的结果是奇怪的浮点数,你知道这些数字代表什么吗?我希望叶节点显示0-9的文本。
回答:
你试图拟合一个分类树,但你的数据是整数,不是因子。
函数rpart
会尝试猜测使用哪种方法,在你的情况下,它做出了错误的猜测。所以你的代码基于method="anova"
来拟合树,而你应该使用method="class"
。
试试这个:
tree <- rpart(Class~., data = digitstrain, method="class")plot(tree,margin=0.1)text(tree, cex=0.7)
为了测试模型的准确性,你可以使用predict
来获取预测值,然后创建一个混淆矩阵:
confusion <- data.frame( class=factor(digitstest$Class), predict=predict(tree, digitstest, type="class") )with(confusion, table(class, predict)) predictclass 0 1 2 3 4 5 6 7 8 9 0 311 1 0 0 0 0 0 7 42 2 1 0 139 186 4 0 0 0 1 10 24 2 0 0 320 14 2 3 0 7 15 3 3 0 6 0 309 1 3 0 17 0 0 4 0 1 0 5 300 0 0 0 0 58 5 0 0 0 74 0 177 0 1 14 69 6 5 0 3 9 12 0 264 11 5 27 7 2 9 11 13 0 10 0 290 0 29 8 60 0 0 0 0 32 0 21 220 3 9 1 44 0 9 20 0 0 8 0 254
请注意,使用单一决策树的预测效果并不理想。一种非常简单的方法来提高预测效果是使用随机森林,由许多使用训练数据随机子集拟合的树组成:
library(randomForest)fst <- randomForest(factor(Class)~., data = digitstrain, method="class")
观察到随机森林的预测结果要好得多:
confusion <- data.frame( class=factor(digitstest$Class), predict=predict(fst, digitstest, type="class") )with(confusion, table(class, predict)) predictclass 0 1 2 3 4 5 6 7 8 9 0 347 0 0 0 0 0 0 0 16 0 1 0 333 28 1 1 0 0 1 0 0 2 0 5 359 0 0 0 0 0 0 0 3 0 4 0 331 0 0 0 0 0 1 4 0 0 0 0 362 1 0 0 0 1 5 0 0 0 8 0 316 0 0 0 11 6 1 0 0 0 0 0 335 0 0 0 7 0 26 2 0 0 0 0 328 0 8 8 0 0 0 0 0 0 0 0 336 0 9 0 2 0 0 0 0 0 2 1 331