优化caret以提高敏感性似乎仍然在优化ROC

我在使用rpart进行caret模型选择时,尝试最大化敏感性。为此,我试图复制这里给出的方法(向下滚动到带有用户定义函数FourStat的示例)caret的GitHub页面

# 创建自己的函数,以便我们可以使用“敏感性”作为要最大化的指标:
Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) {
    out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL))
    c(out, Sensitivity = out["Sens"])
}
rpart_caret_fit <- train(outcome~pred1+pred2+pred3+pred4,
    na.action = na.pass,
    method = "rpart",
     control=rpart.control(maxdepth = 6),
    tuneLength = 20,
     # 最大化敏感性
    metric = "Sensitivity",
     maximize = TRUE,
    trControl = trainControl(classProbs = TRUE,
    summaryFunction = Sensitivity.fc))

然而,当我使用以下命令获取摘要时:

rpart_caret_fit

结果表明它仍然使用ROC标准来选择最终模型:

CART 678282 samples 4 predictor 2 classes: 'yes', 'no' No pre-processingResampling: Bootstrapped (25 reps) Summary of sample sizes: 678282, 678282, 678282, 678282, 678282, 678282, ... Resampling results across tuning parameters:cp              ROC        Sens       Spec       Sensitivity.Sens0.000001909738  0.7259486  0.4123547  0.8227382  0.4123547       0.000002864607  0.7259486  0.4123547  0.8227382  0.4123547       0.000005729214  0.7259489  0.4123622  0.8227353  0.4123622       0.000006684083  0.7258036  0.4123614  0.8227379  0.4123614       0.000007638953  0.7258031  0.4123576  0.8227398  0.4123576       0.000009548691  0.7258028  0.4123539  0.8227416  0.4123539       0.000010694534  0.7257553  0.4123589  0.8227332  0.4123589       0.000015277905  0.7257313  0.4123614  0.8227290  0.4123614       0.000032465548  0.7253456  0.4112838  0.8234272  0.4112838       0.000038194763  0.7252966  0.4112912  0.8234196  0.4112912       0.000076389525  0.7248774  0.4102792  0.8240339  0.4102792       0.000164237480  0.7244847  0.4093688  0.8246372  0.4093688       0.000194793290  0.7241532  0.4086596  0.8250930  0.4086596       0.000310650737  0.7237546  0.4087379  0.8250393  0.4087379       0.001625187154  0.7233805  0.4006570  0.8295729  0.4006570       0.001726403276  0.7233225  0.3983850  0.8308874  0.3983850       0.002173282000  0.7230906  0.3915758  0.8348320  0.3915758       0.002237258227  0.7230906  0.3915758  0.8348320  0.3915758       0.006140444689  0.7173854  0.4897494  0.7695558  0.4897494       0.055330843035  0.5730987  0.2710906  0.8545549  0.2710906       ROC was used to select the optimal model using the largest value.The final value used for the model was cp = 0.000005729214.

如何覆盖ROC选择方法?


回答:

你把事情复杂化了。

两类摘要已经包含敏感性作为输出。列名是“Sens”。只需指定以下内容就足够了:

metric = "Sens"trainsummaryFunction = twoClassSummarytrainControl

完整示例:

library(caret)library(mlbench)data(Sonar)rpart_caret_fit <- train(Class~.,
                          data = Sonar,
                         method = "rpart",
                          tuneLength = 20,
                          metric = "Sens",
                          maximize = TRUE,
                         trControl = trainControl(classProbs = TRUE,
                                                  method = "cv",
                                                  number = 5,
                                                  summaryFunction = twoClassSummary))rpart_caret_fitCART 208 samples 60 predictor  2 classes: 'M', 'R' No pre-processingResampling: Cross-Validated (5 fold) Summary of sample sizes: 167, 166, 166, 166, 167 Resampling results across tuning parameters:  cp         ROC        Sens       Spec       0.0000000  0.7088298  0.7023715  0.7210526  0.0255019  0.7075400  0.7292490  0.6684211  0.0510038  0.7105388  0.7758893  0.6405263  0.0765057  0.6904202  0.7841897  0.6294737  0.1020076  0.7104681  0.8114625  0.6094737  0.1275095  0.7104681  0.8114625  0.6094737  0.1530114  0.7104681  0.8114625  0.6094737  0.1785133  0.7104681  0.8114625  0.6094737  0.2040152  0.7104681  0.8114625  0.6094737  0.2295171  0.7104681  0.8114625  0.6094737  0.2550190  0.7104681  0.8114625  0.6094737  0.2805209  0.7104681  0.8114625  0.6094737  0.3060228  0.7104681  0.8114625  0.6094737  0.3315247  0.7104681  0.8114625  0.6094737  0.3570266  0.7104681  0.8114625  0.6094737  0.3825285  0.7104681  0.8114625  0.6094737  0.4080304  0.7104681  0.8114625  0.6094737  0.4335323  0.7104681  0.8114625  0.6094737  0.4590342  0.6500135  0.8205534  0.4794737  0.4845361  0.6500135  0.8205534  0.4794737Sens was used to select the optimal model using the largest value.The final value used for the model was cp = 0.4845361.

此外,我认为你不能在caret的train中指定control = rpart.control(maxdepth = 6) 这是不正确的 – caret会通过...向前传递任何参数。因此,你可以传递几乎任何参数。

如果你想编写自己的摘要函数,这里有一个关于“Sens”的示例:

Sensitivity.fc <- function (data, lev = NULL, model = NULL) { #每个摘要函数都接受这三个参数
  obs <- data[, "obs"] #这些是真实值 - 在数据中始终在名为“obs”的列中
  cls <- levels(obs) #这些是级别 - 你也可以将它们传递给lev参数
  probs <- data[, cls[2]] #这些是第二类的概率 - 只有当prob = TRUE时才有用
  class <- as.factor(ifelse(probs > 0.5, cls[2], cls[1])) #根据某个概率阈值计算类别
  Sensitivity <- caret::sensitivity(class, obs) #执行计算 - 我很懒,所以我使用了内置函数来帮我完成
  names(Sensitivity) <- "Sens" #输出名称
  Sensitivity}

现在:

rpart_caret_fit <- train(Class~.,
                          data = Sonar,
                         method = "rpart",
                          tuneLength = 20,
                          metric = "Sens", #因为这一行:names(Sensitivity) <- "Sens"
                          maximize = TRUE,
                         trControl = trainControl(classProbs = TRUE,
                                                  method = "cv",
                                                  number = 5,
                                                  summaryFunction = Sensitivity.fc))

让我们检查两者是否产生相同的结果:

set.seed(1)fit_sens <- train(Class~.,
                   data = Sonar,
                  method = "rpart",
                   tuneLength = 20,
                   metric = "Sens",
                   maximize = TRUE,
                  trControl = trainControl(classProbs = TRUE,
                                           method = "cv",
                                           number = 5,
                                           summaryFunction = Sensitivity.fc))set.seed(1)fit_sens2 <- train(Class~.,
                    data = Sonar,
                   method = "rpart",
                    tuneLength = 20,
                    metric = "Sens",
                    maximize = TRUE,
                   trControl = trainControl(classProbs = TRUE,
                                            method = "cv",
                                            number = 5,
                                            summaryFunction = twoClassSummary))all.equal(fit_sens$results[c("cp", "Sens")],
          fit_sens2$results[c("cp", "Sens")])  TRUEall.equal(fit_sens$bestTune,
          fit_sens2$bestTune)TRUE

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注