优化caret以提高敏感性似乎仍然在优化ROC

我在使用rpart进行caret模型选择时,尝试最大化敏感性。为此,我试图复制这里给出的方法(向下滚动到带有用户定义函数FourStat的示例)caret的GitHub页面

# 创建自己的函数,以便我们可以使用“敏感性”作为要最大化的指标:
Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) {
    out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL))
    c(out, Sensitivity = out["Sens"])
}
rpart_caret_fit <- train(outcome~pred1+pred2+pred3+pred4,
    na.action = na.pass,
    method = "rpart",
     control=rpart.control(maxdepth = 6),
    tuneLength = 20,
     # 最大化敏感性
    metric = "Sensitivity",
     maximize = TRUE,
    trControl = trainControl(classProbs = TRUE,
    summaryFunction = Sensitivity.fc))

然而,当我使用以下命令获取摘要时:

rpart_caret_fit

结果表明它仍然使用ROC标准来选择最终模型:

CART 678282 samples 4 predictor 2 classes: 'yes', 'no' No pre-processingResampling: Bootstrapped (25 reps) Summary of sample sizes: 678282, 678282, 678282, 678282, 678282, 678282, ... Resampling results across tuning parameters:cp              ROC        Sens       Spec       Sensitivity.Sens0.000001909738  0.7259486  0.4123547  0.8227382  0.4123547       0.000002864607  0.7259486  0.4123547  0.8227382  0.4123547       0.000005729214  0.7259489  0.4123622  0.8227353  0.4123622       0.000006684083  0.7258036  0.4123614  0.8227379  0.4123614       0.000007638953  0.7258031  0.4123576  0.8227398  0.4123576       0.000009548691  0.7258028  0.4123539  0.8227416  0.4123539       0.000010694534  0.7257553  0.4123589  0.8227332  0.4123589       0.000015277905  0.7257313  0.4123614  0.8227290  0.4123614       0.000032465548  0.7253456  0.4112838  0.8234272  0.4112838       0.000038194763  0.7252966  0.4112912  0.8234196  0.4112912       0.000076389525  0.7248774  0.4102792  0.8240339  0.4102792       0.000164237480  0.7244847  0.4093688  0.8246372  0.4093688       0.000194793290  0.7241532  0.4086596  0.8250930  0.4086596       0.000310650737  0.7237546  0.4087379  0.8250393  0.4087379       0.001625187154  0.7233805  0.4006570  0.8295729  0.4006570       0.001726403276  0.7233225  0.3983850  0.8308874  0.3983850       0.002173282000  0.7230906  0.3915758  0.8348320  0.3915758       0.002237258227  0.7230906  0.3915758  0.8348320  0.3915758       0.006140444689  0.7173854  0.4897494  0.7695558  0.4897494       0.055330843035  0.5730987  0.2710906  0.8545549  0.2710906       ROC was used to select the optimal model using the largest value.The final value used for the model was cp = 0.000005729214.

如何覆盖ROC选择方法?


回答:

你把事情复杂化了。

两类摘要已经包含敏感性作为输出。列名是“Sens”。只需指定以下内容就足够了:

metric = "Sens"trainsummaryFunction = twoClassSummarytrainControl

完整示例:

library(caret)library(mlbench)data(Sonar)rpart_caret_fit <- train(Class~.,
                          data = Sonar,
                         method = "rpart",
                          tuneLength = 20,
                          metric = "Sens",
                          maximize = TRUE,
                         trControl = trainControl(classProbs = TRUE,
                                                  method = "cv",
                                                  number = 5,
                                                  summaryFunction = twoClassSummary))rpart_caret_fitCART 208 samples 60 predictor  2 classes: 'M', 'R' No pre-processingResampling: Cross-Validated (5 fold) Summary of sample sizes: 167, 166, 166, 166, 167 Resampling results across tuning parameters:  cp         ROC        Sens       Spec       0.0000000  0.7088298  0.7023715  0.7210526  0.0255019  0.7075400  0.7292490  0.6684211  0.0510038  0.7105388  0.7758893  0.6405263  0.0765057  0.6904202  0.7841897  0.6294737  0.1020076  0.7104681  0.8114625  0.6094737  0.1275095  0.7104681  0.8114625  0.6094737  0.1530114  0.7104681  0.8114625  0.6094737  0.1785133  0.7104681  0.8114625  0.6094737  0.2040152  0.7104681  0.8114625  0.6094737  0.2295171  0.7104681  0.8114625  0.6094737  0.2550190  0.7104681  0.8114625  0.6094737  0.2805209  0.7104681  0.8114625  0.6094737  0.3060228  0.7104681  0.8114625  0.6094737  0.3315247  0.7104681  0.8114625  0.6094737  0.3570266  0.7104681  0.8114625  0.6094737  0.3825285  0.7104681  0.8114625  0.6094737  0.4080304  0.7104681  0.8114625  0.6094737  0.4335323  0.7104681  0.8114625  0.6094737  0.4590342  0.6500135  0.8205534  0.4794737  0.4845361  0.6500135  0.8205534  0.4794737Sens was used to select the optimal model using the largest value.The final value used for the model was cp = 0.4845361.

此外,我认为你不能在caret的train中指定control = rpart.control(maxdepth = 6) 这是不正确的 – caret会通过...向前传递任何参数。因此,你可以传递几乎任何参数。

如果你想编写自己的摘要函数,这里有一个关于“Sens”的示例:

Sensitivity.fc <- function (data, lev = NULL, model = NULL) { #每个摘要函数都接受这三个参数
  obs <- data[, "obs"] #这些是真实值 - 在数据中始终在名为“obs”的列中
  cls <- levels(obs) #这些是级别 - 你也可以将它们传递给lev参数
  probs <- data[, cls[2]] #这些是第二类的概率 - 只有当prob = TRUE时才有用
  class <- as.factor(ifelse(probs > 0.5, cls[2], cls[1])) #根据某个概率阈值计算类别
  Sensitivity <- caret::sensitivity(class, obs) #执行计算 - 我很懒,所以我使用了内置函数来帮我完成
  names(Sensitivity) <- "Sens" #输出名称
  Sensitivity}

现在:

rpart_caret_fit <- train(Class~.,
                          data = Sonar,
                         method = "rpart",
                          tuneLength = 20,
                          metric = "Sens", #因为这一行:names(Sensitivity) <- "Sens"
                          maximize = TRUE,
                         trControl = trainControl(classProbs = TRUE,
                                                  method = "cv",
                                                  number = 5,
                                                  summaryFunction = Sensitivity.fc))

让我们检查两者是否产生相同的结果:

set.seed(1)fit_sens <- train(Class~.,
                   data = Sonar,
                  method = "rpart",
                   tuneLength = 20,
                   metric = "Sens",
                   maximize = TRUE,
                  trControl = trainControl(classProbs = TRUE,
                                           method = "cv",
                                           number = 5,
                                           summaryFunction = Sensitivity.fc))set.seed(1)fit_sens2 <- train(Class~.,
                    data = Sonar,
                   method = "rpart",
                    tuneLength = 20,
                    metric = "Sens",
                    maximize = TRUE,
                   trControl = trainControl(classProbs = TRUE,
                                            method = "cv",
                                            number = 5,
                                            summaryFunction = twoClassSummary))all.equal(fit_sens$results[c("cp", "Sens")],
          fit_sens2$results[c("cp", "Sens")])  TRUEall.equal(fit_sens$bestTune,
          fit_sens2$bestTune)TRUE

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注