我已经手动调整参数以找到最佳的ntree值:
bestMtry <- 3control <- trainControl(method = 'repeatedcv', number = 10, repeats = 3, search = 'grid')storeMaxtrees <- list()tuneGrid <- expand.grid(.mtry = bestMtry)for (ntree in c(1000, 1500, 2000)) { set.seed(291) rf.maxtrees <- train(survived ~ ., data = trainingSet, method = "rf", metric = "Accuracy", tuneGrid = tuneGrid, trControl = control, importance = TRUE, nodesize = 14, maxnodes = 24, ntree = ntree) key <- toString(ntree) storeMaxtrees[[key]] <- rf.maxtrees}resultsTree <- resamples(storeMaxtrees)summary(resultsTree)
输出结果:
Call:summary.resamples(object = resultsTree)Models: 1000, 1500, 2000 Number of resamples: 30 Accuracy Min. 1st Qu. Median Mean 3rd Qu. Max. NA's1000 0.7865169 0.8181818 0.8305031 0.8335064 0.8498787 0.8764045 01500 0.7865169 0.8181818 0.8305031 0.8319913 0.8522727 0.8764045 02000 0.7865169 0.8181818 0.8305031 0.8327446 0.8522727 0.8764045 0Kappa Min. 1st Qu. Median Mean 3rd Qu. Max. NA's1000 0.2700461 0.4243663 0.4786274 0.4753027 0.5252316 0.6281808 01500 0.2700461 0.4218811 0.4710053 0.4705338 0.5270828 0.6281808 02000 0.2700461 0.4218811 0.4786274 0.4721715 0.5270828 0.6281808 0
从输出结果中,我可以理解到基于准确率和Kappa值,2000是ntree的最佳值。我想动态存储ntree的最佳值(2000)。有没有类似于best_ntree <- resultsTree.bestTune
的方法?
回答:
您可以存储summary()调用的结果,例如:
bestMtry <- 3control <- trainControl(method = 'repeatedcv',number = 5)data = MASS::Pima.tr storeMaxtrees <- list()tuneGrid <- expand.grid(.mtry = bestMtry)for (ntree in c(1000, 1500, 2000)) { set.seed(291) rf.maxtrees <- train(type ~ ., data = data, method = "rf", metric = "Accuracy", tuneGrid = tuneGrid, trControl = control, importance = TRUE, nodesize = 14, maxnodes = 24, ntree = ntree) key <- toString(ntree) storeMaxtrees[[key]] <- rf.maxtrees}resultsTree <- resamples(storeMaxtrees)
我们可以选择平均准确率最高的那个:
res = summary(resultsTree)res$models[which.max(res$statistics$Accuracy[,"Mean"])][1] "1500"
您可以将示例中的1500转换为数值类型…