我尝试解决医学中的一个常见问题:将预测模型与其他来源(例如,专家的意见[在医学中有时被高度强调])结合,在本文中称为superdoc
预测器。
这个问题可以通过堆叠一个模型和一个逻辑回归(输入专家的意见)来解决,如本文第26页所述:
Afshar P, Mohammadi A, Plataniotis KN, Oikonomou A, Benali H. FromHandcrafted to Deep-Learning-Based Cancer Radiomics: Challenges andOpportunities. IEEE Signal Process Mag 2019; 36: 132–60. 可在此处获取 here
我在这里尝试过here,没有考虑过拟合(我没有对较低层学习器应用折外预测):
示例数据
# librarylibrary(tidyverse)library(caret)library(glmnet)library(mlbench)# 获取示例数据data(PimaIndiansDiabetes, package="mlbench")data <- PimaIndiansDiabetes# 将超级医生的意见添加到数据集中set.seed(2323)data %>% rowwise() %>% mutate(superdoc=case_when(diabetes=="pos" ~ as.numeric(sample(0:2,1)), TRUE~ 0)) -> data# 将数据分成训练集和测试集train.data <- data[1:550,]test.data <- data[551:768,]
不考虑折外预测的堆叠模型:
# 弹性网络回归(不包括超级医生的意见)set.seed(2323)model <- train( diabetes ~., data = train.data %>% select(-superdoc), method = "glmnet", trControl = trainControl("repeatedcv", number = 10, repeats=10, classProbs = TRUE, savePredictions = TRUE, summaryFunction = twoClassSummary), tuneLength = 10, metric="ROC" #ROC指标在twoClassSummary中)# 提取最佳alpha和lambda的系数 coef(model$finalModel, model$finalModel$lambdaOpt) -> coeffstidy(coeffs) %>% tibble() -> coeffscoef.interc = coeffs %>% filter(row=="(Intercept)") %>% pull(value)coef.pregnant = coeffs %>% filter(row=="pregnant") %>% pull(value)coef.glucose = coeffs %>% filter(row=="glucose") %>% pull(value)coef.pressure = coeffs %>% filter(row=="pressure") %>% pull(value)coef.mass = coeffs %>% filter(row=="mass") %>% pull(value)coef.pedigree = coeffs %>% filter(row=="pedigree") %>% pull(value)coef.age = coeffs %>% filter(row=="age") %>% pull(value)# 将模型与超级医生的意见结合到逻辑回归模型中finalmodel = glm(diabetes ~ superdoc + I(coef.interc + coef.pregnant*pregnant + coef.glucose*glucose + coef.pressure*pressure + coef.mass*mass + coef.pedigree*pedigree + coef.age*age),family=binomial, data=train.data)# 在测试数据上进行预测predict(finalmodel,test.data, type="response") -> predictions# 检查测试数据中模型的AUC值roc(test.data$diabetes,predictions, ci=TRUE) #> Setting levels: control = neg, case = pos#> Setting direction: controls < cases#> #> Call:#> roc.default(response = test.data$diabetes, predictor = predictions, ci = TRUE)#> #> Data: predictions in 145 controls (test.data$diabetes neg) < 73 cases (test.data$diabetes pos).#> Area under the curve: 0.9345#> 95% CI: 0.8969-0.9721 (DeLong)
现在我想根据这个非常有用的帖子使用mlr3
包家族来考虑折外预测:Tuning a stacked learner
#librarylibrary(mlr3)library(mlr3learners)library(mlr3pipelines)library(mlr3filters)library(mlr3tuning)library(paradox)library(glmnet)# 创建弹性网络回归glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")# 创建学习器的折外预测glmnet_cv1 = po("learner_cv", glmnet_lrn, id = "glmnet") #我找不到设置来过滤预测器(即,不在这里发送superdoc预测器)# 总结步骤 level0 = gunion(list( glmnet_cv1, po("nop", id = "only_superdoc_predictor"))) %>>% #我找不到设置只将superdoc预测器发送到"union1" po("featureunion", id = "union1")# 最终的逻辑回归log_reg_lrn = lrn("classif.log_reg", predict_type = "prob")# 组合集成模型ensemble = level0 %>>% log_reg_lrnensemble$plot(html = FALSE)
由reprex包(v1.0.0)在2021-03-15创建
我的问题(我对mlr3
包家族还比较新)
mlr3
包家族是否适合我试图构建的集成模型?- 如果是,我该如何完成集成模型并在
test.data
上进行预测?
回答: