我正在使用glm进行逻辑回归,并进行5折交叉验证。以下是使用内置的cars数据集的可重现示例
library(caret)data("mtcars")str(mtcars)mtcars$vs<-as.factor(mtcars$vs)df0<-na.omit(mtcars)set.seed(123) train.control <- trainControl(method = "cv", number = 5)# Train the modelmodel <- train(vs ~., data = mtcars, method = "glm", trControl = train.control)print(model)summary(model)model$resampleconfusionMatrix(model)pred.mod <- predict(model)confusionMatrix(data=pred.mod, reference=mtcars$vs)
输出
> print(model)Generalized Linear Model 32 samples10 predictors 2 classes: '0', '1' No pre-processingResampling: Cross-Validated (5 fold) Summary of sample sizes: 25, 26, 25, 27, 25 Resampling results: Accuracy Kappa 0.9095238 0.8164638> summary(model)Call:NULLDeviance Residuals: Min 1Q Median 3Q Max -1.181e-05 -2.110e-08 -2.110e-08 2.110e-08 1.181e-05 Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) 8.117e+01 1.589e+07 0 1mpg 2.451e+00 5.979e+04 0 1cyl -3.908e+01 2.947e+05 0 1disp -1.927e-02 8.518e+03 0 1hp 3.129e-01 2.283e+04 0 1drat -2.735e+01 9.696e+05 0 1wt -1.248e+01 6.437e+05 0 1qsec 1.565e+01 3.845e+05 0 1am -4.562e+01 3.632e+05 0 1gear -2.835e+01 5.448e+05 0 1carb 1.788e+01 2.971e+05 0 1(Dispersion parameter for binomial family taken to be 1) Null deviance: 4.3860e+01 on 31 degrees of freedomResidual deviance: 7.2154e-10 on 21 degrees of freedomAIC: 22Number of Fisher Scoring iterations: 25> model$resample Accuracy Kappa Resample1 0.8571429 0.6956522 Fold12 0.8333333 0.6666667 Fold23 0.8571429 0.7200000 Fold34 1.0000000 1.0000000 Fold45 1.0000000 1.0000000 Fold5> confusionMatrix(model)Cross-Validated (5 fold) Confusion Matrix (entries are percentual average cell counts across resamples) ReferencePrediction 0 1 0 50.0 3.1 1 6.2 40.6 Accuracy (average) : 0.9062> pred.mod <- predict(model)> confusionMatrix(data=pred.mod, reference=mtcars$vs)Confusion Matrix and Statistics ReferencePrediction 0 1 0 18 0 1 0 14 Accuracy : 1 95% CI : (0.8911, 1) No Information Rate : 0.5625 P-Value [Acc > NIR] : 1.009e-08 Kappa : 1 Mcnemar's Test P-Value : NA Sensitivity : 1.0000 Specificity : 1.0000 Pos Pred Value : 1.0000 Neg Pred Value : 1.0000 Prevalence : 0.5625 Detection Rate : 0.5625 Detection Prevalence : 0.5625 Balanced Accuracy : 1.0000 'Positive' Class : 0
这些都运行得很好,但我希望获取每折的summary(model)信息(即执行summary()时获得的系数、p值、z分数等),以及如果可能的话,每折的灵敏度和特异性。有人能帮忙吗?
回答:
这是一个有趣的问题。你想要的值无法直接从model
对象中获取,但可以通过知道训练数据的哪些观察属于哪一折来重新计算。这些信息可以在model
中提取,如果你在trainControl
函数中指定savePredictions = "all"
。有了每折的预测,你可以这样做:
#首先,保存所有折的所有预测set.seed(123) train.control <- trainControl(method = "cv", number = 5,savePredictions = "all")# Train the modelmodel <- train(vs ~., data = mtcars, method = "glm", trControl = train.control)#现在我们可以提取你想要的统计信息fold <- unique(pred$Resample)mystat <- function(model,x){pred <- model$preddf <- pred[pred$Resample==x,]cm <- confusionMatrix(df$pred,df$obs)control <- trainControl(method = "none")newdat <- mtcars[pred$rowIndex,]fit <- train(vs~.,data=newdat,trControl=control)summ <- summary(model)z_p <- summ$coefficients[,3:4]return(list(cm,z_p))}stat <- lapply(fold, mystat,model=model)names(stat) <- fold
请注意,通过在trainControl
中指定method="none"
强制train
在不进行任何重抽样或参数调整的情况下将模型拟合到整个训练集上。这样的形式可能不是一个漂亮的函数,但它能实现你想要的功能,你可以随时调整它以使其更通用。