如何使用Caret为每次交叉验证绘制ROC曲线

我有以下代码:

library(mlbench)library(caret)library(ggplot2)set.seed(998)# 准备数据 ------------------------------------------------------------data(Sonar)my_data <- Sonar# 定义交叉验证 ---------------------------------------------------fitControl <-  trainControl(    method = "cv",    number = 10,    classProbs = T,    savePredictions = T,    summaryFunction = twoClassSummary  )# 使用随机森林进行训练 ----------------------------------------------------------------model <- train(  Class ~ .,  data = my_data,  method = "rf",  trControl = fitControl,  metric = "ROC")for_lift <- data.frame(Class = model$pred$obs, rf = model$pred$R)lift_obj <- lift(Class ~ rf, data = for_lift, class = "R")# 绘制ROC曲线 ----------------------------------------------------------------ggplot(lift_obj$data) +  geom_line(aes(1 - Sp, Sn, color = liftModelVar)) +  scale_color_discrete(guide = guide_legend(title = "method"))

它生成的图像是这样的。

请注意,我在进行10折交叉验证。生成的ROC曲线只是针对最终的平均值。

我想做的,是为每个交叉验证生成10条ROC曲线。我该如何实现这一点?


回答:

library(mlbench)library(caret)library(ggplot2)set.seed(998)# 准备数据 ------------------------------------------------------------data(Sonar)my_data <- Sonar# 定义交叉验证 ---------------------------------------------------fitControl <-  trainControl(    method = "cv",    number = 10,    classProbs = T,    savePredictions = T,    summaryFunction = twoClassSummary  )# 使用随机森林进行训练 ----------------------------------------------------------------model <- train(  Class ~ .,  data = my_data,  method = "rf",  trControl = fitControl,  metric = "ROC")for_lift <- data.frame(Class = model$pred$obs, rf = model$pred$R, resample = model$pred$Resample)lift_df <-  data.frame()for (fold in unique(for_lift$resample)) {  fold_df <- dplyr::filter(for_lift, resample == fold)  lift_obj_data <- lift(Class ~ rf, data = fold_df, class = "R")$data  lift_obj_data$fold = fold  lift_df = rbind(lift_df, lift_obj_data)}lift_obj <- lift(Class ~ rf, data = for_lift, class = "R")# 绘制ROC曲线 ----------------------------------------------------------------ggplot(lift_df) +  geom_line(aes(1 - Sp, Sn, color = fold)) +  scale_color_discrete(guide = guide_legend(title = "Fold"))

Plot

计算AUC:

model <- train(  Class ~ .,  data = my_data,  method = "rf",  trControl = fitControl,  metric = "ROC")library(plyr)library(MLmetrics)ddply(model$pred, "Resample", summarise,      accuracy = Accuracy(pred, obs))

输出:

   Resample  accuracy1    Fold01 0.82539682    Fold02 0.80952383    Fold03 0.80000004    Fold04 0.82539685    Fold05 0.80952386    Fold06 0.82539687    Fold07 0.83333338    Fold08 0.82539689    Fold09 0.984127010   Fold10 0.7936508

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数?

我在一个视频中使用K-means聚类技术,但我不明白为…

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名?

我想制作一个用户友好的GUI图像分类器,用户只需指向数…

如何查看每个词的tf-idf得分

我试图了解文档中每个词的tf-idf得分。然而,它只返…

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’?

我在制作一个用于情感分析的逻辑回归模型时遇到了这个问题…

如何向神经网络输入两个不同大小的输入?

我想向神经网络输入两个数据集。第一个数据集(元素)具有…

逻辑回归与机器学习有何关联

我们正在开会讨论聘请一位我们信任的顾问来做机器学习。一…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注