如何使用Caret为每次交叉验证绘制ROC曲线

我有以下代码:

library(mlbench)library(caret)library(ggplot2)set.seed(998)# 准备数据 ------------------------------------------------------------data(Sonar)my_data <- Sonar# 定义交叉验证 ---------------------------------------------------fitControl <-  trainControl(    method = "cv",    number = 10,    classProbs = T,    savePredictions = T,    summaryFunction = twoClassSummary  )# 使用随机森林进行训练 ----------------------------------------------------------------model <- train(  Class ~ .,  data = my_data,  method = "rf",  trControl = fitControl,  metric = "ROC")for_lift <- data.frame(Class = model$pred$obs, rf = model$pred$R)lift_obj <- lift(Class ~ rf, data = for_lift, class = "R")# 绘制ROC曲线 ----------------------------------------------------------------ggplot(lift_obj$data) +  geom_line(aes(1 - Sp, Sn, color = liftModelVar)) +  scale_color_discrete(guide = guide_legend(title = "method"))

它生成的图像是这样的。

请注意,我在进行10折交叉验证。生成的ROC曲线只是针对最终的平均值。

我想做的,是为每个交叉验证生成10条ROC曲线。我该如何实现这一点?


回答:

library(mlbench)library(caret)library(ggplot2)set.seed(998)# 准备数据 ------------------------------------------------------------data(Sonar)my_data <- Sonar# 定义交叉验证 ---------------------------------------------------fitControl <-  trainControl(    method = "cv",    number = 10,    classProbs = T,    savePredictions = T,    summaryFunction = twoClassSummary  )# 使用随机森林进行训练 ----------------------------------------------------------------model <- train(  Class ~ .,  data = my_data,  method = "rf",  trControl = fitControl,  metric = "ROC")for_lift <- data.frame(Class = model$pred$obs, rf = model$pred$R, resample = model$pred$Resample)lift_df <-  data.frame()for (fold in unique(for_lift$resample)) {  fold_df <- dplyr::filter(for_lift, resample == fold)  lift_obj_data <- lift(Class ~ rf, data = fold_df, class = "R")$data  lift_obj_data$fold = fold  lift_df = rbind(lift_df, lift_obj_data)}lift_obj <- lift(Class ~ rf, data = for_lift, class = "R")# 绘制ROC曲线 ----------------------------------------------------------------ggplot(lift_df) +  geom_line(aes(1 - Sp, Sn, color = fold)) +  scale_color_discrete(guide = guide_legend(title = "Fold"))

Plot

计算AUC:

model <- train(  Class ~ .,  data = my_data,  method = "rf",  trControl = fitControl,  metric = "ROC")library(plyr)library(MLmetrics)ddply(model$pred, "Resample", summarise,      accuracy = Accuracy(pred, obs))

输出:

   Resample  accuracy1    Fold01 0.82539682    Fold02 0.80952383    Fold03 0.80000004    Fold04 0.82539685    Fold05 0.80952386    Fold06 0.82539687    Fold07 0.83333338    Fold08 0.82539689    Fold09 0.984127010   Fold10 0.7936508

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注