我在使用rpart
进行caret模型选择时,尝试最大化敏感性。为此,我试图复制这里给出的方法(向下滚动到带有用户定义函数FourStat的示例)caret的GitHub页面
# 创建自己的函数,以便我们可以使用“敏感性”作为要最大化的指标:
Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) {
out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL))
c(out, Sensitivity = out["Sens"])
}
rpart_caret_fit <- train(outcome~pred1+pred2+pred3+pred4,
na.action = na.pass,
method = "rpart",
control=rpart.control(maxdepth = 6),
tuneLength = 20,
# 最大化敏感性
metric = "Sensitivity",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
summaryFunction = Sensitivity.fc))
然而,当我使用以下命令获取摘要时:
rpart_caret_fit
结果表明它仍然使用ROC标准来选择最终模型:
CART 678282 samples 4 predictor 2 classes: 'yes', 'no' No pre-processingResampling: Bootstrapped (25 reps) Summary of sample sizes: 678282, 678282, 678282, 678282, 678282, 678282, ... Resampling results across tuning parameters:cp ROC Sens Spec Sensitivity.Sens0.000001909738 0.7259486 0.4123547 0.8227382 0.4123547 0.000002864607 0.7259486 0.4123547 0.8227382 0.4123547 0.000005729214 0.7259489 0.4123622 0.8227353 0.4123622 0.000006684083 0.7258036 0.4123614 0.8227379 0.4123614 0.000007638953 0.7258031 0.4123576 0.8227398 0.4123576 0.000009548691 0.7258028 0.4123539 0.8227416 0.4123539 0.000010694534 0.7257553 0.4123589 0.8227332 0.4123589 0.000015277905 0.7257313 0.4123614 0.8227290 0.4123614 0.000032465548 0.7253456 0.4112838 0.8234272 0.4112838 0.000038194763 0.7252966 0.4112912 0.8234196 0.4112912 0.000076389525 0.7248774 0.4102792 0.8240339 0.4102792 0.000164237480 0.7244847 0.4093688 0.8246372 0.4093688 0.000194793290 0.7241532 0.4086596 0.8250930 0.4086596 0.000310650737 0.7237546 0.4087379 0.8250393 0.4087379 0.001625187154 0.7233805 0.4006570 0.8295729 0.4006570 0.001726403276 0.7233225 0.3983850 0.8308874 0.3983850 0.002173282000 0.7230906 0.3915758 0.8348320 0.3915758 0.002237258227 0.7230906 0.3915758 0.8348320 0.3915758 0.006140444689 0.7173854 0.4897494 0.7695558 0.4897494 0.055330843035 0.5730987 0.2710906 0.8545549 0.2710906 ROC was used to select the optimal model using the largest value.The final value used for the model was cp = 0.000005729214.
如何覆盖ROC选择方法?
回答:
你把事情复杂化了。
两类摘要已经包含敏感性作为输出。列名是“Sens”。只需指定以下内容就足够了:
metric = "Sens"
到 train
和 summaryFunction = twoClassSummary
到 trainControl
完整示例:
library(caret)library(mlbench)data(Sonar)rpart_caret_fit <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = twoClassSummary))rpart_caret_fitCART 208 samples 60 predictor 2 classes: 'M', 'R' No pre-processingResampling: Cross-Validated (5 fold) Summary of sample sizes: 167, 166, 166, 166, 167 Resampling results across tuning parameters: cp ROC Sens Spec 0.0000000 0.7088298 0.7023715 0.7210526 0.0255019 0.7075400 0.7292490 0.6684211 0.0510038 0.7105388 0.7758893 0.6405263 0.0765057 0.6904202 0.7841897 0.6294737 0.1020076 0.7104681 0.8114625 0.6094737 0.1275095 0.7104681 0.8114625 0.6094737 0.1530114 0.7104681 0.8114625 0.6094737 0.1785133 0.7104681 0.8114625 0.6094737 0.2040152 0.7104681 0.8114625 0.6094737 0.2295171 0.7104681 0.8114625 0.6094737 0.2550190 0.7104681 0.8114625 0.6094737 0.2805209 0.7104681 0.8114625 0.6094737 0.3060228 0.7104681 0.8114625 0.6094737 0.3315247 0.7104681 0.8114625 0.6094737 0.3570266 0.7104681 0.8114625 0.6094737 0.3825285 0.7104681 0.8114625 0.6094737 0.4080304 0.7104681 0.8114625 0.6094737 0.4335323 0.7104681 0.8114625 0.6094737 0.4590342 0.6500135 0.8205534 0.4794737 0.4845361 0.6500135 0.8205534 0.4794737Sens was used to select the optimal model using the largest value.The final value used for the model was cp = 0.4845361.
此外,我认为你不能在caret的 这是不正确的 – caret会通过train
中指定control = rpart.control(maxdepth = 6)
。...
向前传递任何参数。因此,你可以传递几乎任何参数。
如果你想编写自己的摘要函数,这里有一个关于“Sens”的示例:
Sensitivity.fc <- function (data, lev = NULL, model = NULL) { #每个摘要函数都接受这三个参数
obs <- data[, "obs"] #这些是真实值 - 在数据中始终在名为“obs”的列中
cls <- levels(obs) #这些是级别 - 你也可以将它们传递给lev参数
probs <- data[, cls[2]] #这些是第二类的概率 - 只有当prob = TRUE时才有用
class <- as.factor(ifelse(probs > 0.5, cls[2], cls[1])) #根据某个概率阈值计算类别
Sensitivity <- caret::sensitivity(class, obs) #执行计算 - 我很懒,所以我使用了内置函数来帮我完成
names(Sensitivity) <- "Sens" #输出名称
Sensitivity}
现在:
rpart_caret_fit <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens", #因为这一行:names(Sensitivity) <- "Sens"
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = Sensitivity.fc))
让我们检查两者是否产生相同的结果:
set.seed(1)fit_sens <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = Sensitivity.fc))set.seed(1)fit_sens2 <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = twoClassSummary))all.equal(fit_sens$results[c("cp", "Sens")],
fit_sens2$results[c("cp", "Sens")]) TRUEall.equal(fit_sens$bestTune,
fit_sens2$bestTune)TRUE