在caret中为cv创建分层折叠的方法如下:
library(caret)library(data.table)train_dat <- data.table(group = c(rep("group1",10), rep("group2",5)), x1 = rnorm(15), x2 = rnorm(15), label = factor(c(rep("treatment",15), rep("control",15))))folds <- createFolds(train_dat[, group], k = 5)fitCtrl <- trainControl(method = "cv", index = folds, classProbs = T, summaryFunction = twoClassSummary)train(label~., data = train_dat[, !c("group"), with = F], trControl = fitCtrl, method = "xgbTree", metric = "ROC")
为了平衡group1和group2,折叠索引的创建是基于”group”变量的。
然而,有没有办法在caret中为repeatedcv创建createFolds呢?这样我就可以为repeatedcv获得一个平衡的分割。我应该结合几个createFolds并运行trainControl吗?
trControl = trainControl(method = "cv", index = many_repeated_folds)
谢谢!
回答:
createMultiFolds
可能是你感兴趣的。