我在尝试使用Caret
创建一个bartMachine
的使用示例,但似乎无法正确地在Caret
中建模bartMachine
。能否有人告诉我主要错误的具体含义?或者是否有简单的可重现的BART建模代码?
以下是我使用HouseVotes84和cars数据集的虚拟数据来建模bartMachine的代码片段:
library(mlbench)library(caret)data("HouseVotes84")#使用HouseVotes84作为分类任务数据集,使用mtcars作为回归任务数据集dummy_data_classif <- HouseVotes84[,2:length(colnames(HouseVotes84))] %>% mutate_if(is.factor, as.numeric)dummy_data_classif <- data.frame(cbind(Class=HouseVotes84[,1], dummy_data_classif))dummy_data_classif[is.na(dummy_data_classif)] <- 0data("cars")dummy_data_regr <- carscaret_method_tester <- function(dummy_data, formula, resample_plan=1, test_method, time_limit=30, grid_param=c(), parallel_mode=FALSE){ library(caret) library(R.utils) formula <- as.formula(formula) resampling <- NULL if(resample_plan==1){ resampling <- trainControl(method = "repeatedcv", number = 10, repeats = 5, allowParallel = parallel_mode) } else if(resample_plan==2){ resampling <- trainControl(method = "cv", number = 5, allowParallel = parallel_mode) } else if(resample_plan==3){ resampling <- trainControl(method = "adaptive_cv", number = 10, repeats = 5, allowParallel = parallel_mode, adaptive = list(min = 3, alpha = 0.05, method = "BT", complete = FALSE)) } else if(resample_plan==4){ resampling <- trainControl(method = "boot", number = 5, allowParallel = parallel_mode) } else if(resample_plan==5){ resampling <- trainControl(method = "boot_all", number = 5, allowParallel = parallel_mode) } tryCatch( expr={ if(length(grid_param) > 0){ withTimeout( model <- caret::train(formula, data = dummy_data, method = test_method, trControl = resampling, tuneGrid=grid_param), timeout = 300 ) } else{ withTimeout( model <- caret::train(formula, data = dummy_data, method = test_method, trControl = resampling), timeout=300 ) } return(model) }, error=function(cond){ message("测试模型失败") message("原始错误信息如下:") message(cond) return(NULL) }, warning=function(cond){ message("触发警告!") message("原始警告信息如下:") message(cond) return(model) } )}bart_reg <- caret_method_tester(dummy_data_regr, "Price ~ .", test_method="bartMachine", time_limit=30, resample_plan=2)测试模型失败原始错误信息如下:参数长度为零bart_classif <- caret_method_tester(dummy_data_classif, "Class ~ .", test_method="bartMachine", time_limit=30, resample_plan=2)测试模型失败原始错误信息如下:维数不正确
我使用了tryCatch方法,以便更容易地通知代码进展情况,因此当代码失败、发出警告或成功时都会很清楚。
据我所知,数据集中没有NA值。
回答:
如果您能将代码简化为最基本的部分会更好,基本上是train
函数与bartMachine
不兼容。我们可以通过以下示例来说明,并得到相同的错误信息:
mdl = train(mpg ~ .,data=mtcars,method="bartMachine",trControl=trainControl(method="cv"))错误信息:在if (grepl("adaptive", trControl$method) & nrow(tuneGrid) == 1) {中:参数长度为零
这个错误是caret
代码中的一个bug,如果您不提供调整网格,默认用于创建它的函数不会返回一个数据框:
getModelInfo()$bartMachine$gridfunction(x, y, len = NULL, search = "grid") { if(search == "grid") { out <- expand.grid(num_trees = 50, k = (1:len)+ 1, alpha = seq(.9, .99, length = len), beta = seq(1, 3, length = len), nu = (1:len)+ 1) } else { out <- data.frame(num_trees = sample(10:100, replace = TRUE, size = len), k = runif(len, min = 0, max = 5), alpha = runif(len, min = .9, max = 1), beta = runif(len, min = 0, max = 4), nu = runif(len, min = 0, max = 5)) } if(is.factor(y)) { out$k <- NA out$nu <- NA } }
您可以提供一个调整网格:
mdl = train(mpg ~ .,data=mtcars,method="bartMachine",trControl=trainControl(method="boot"),tuneGrid=data.frame(num_trees=50,k=3,alpha=0.1,beta=0.1,nu=4))mdl贝叶斯加性回归树32个样本10个预测变量无预处理重抽样:Bootstrap(25次重复)样本大小总结:32, 32, 32, 32, 32, 32, ...重抽样结果: RMSE R平方 MAE 2.826126 0.8344417 2.292464调整参数'num_trees'保持在50的值不变'beta'保持在0.1的值不变调整参数'nu'保持在4的值不变
或者您可以修复上面的函数并创建一个新方法,您可以在这里阅读更多信息这里:
newBartMachine = getModelInfo()$bartMachinenewBartMachine$grid = function(x, y, len = NULL, search = "grid") { if(search == "grid") { out <- expand.grid(num_trees = 50, k = (1:len)+ 1, alpha = seq(.9, .99, length = len), beta = seq(1, 3, length = len), nu = (1:len)+ 1) } else { out <- data.frame(num_trees = sample(10:100, replace = TRUE, size = len), k = runif(len, min = 0, max = 5), alpha = runif(len, min = .9, max = 1), beta = runif(len, min = 0, max = 4), nu = runif(len, min = 0, max = 5)) } if(is.factor(y)) { out$k <- NA out$nu <- NA } return(out) }mdl = train(mpg ~ .,data=mtcars,method=newBartMachine,trControl=trainControl(method="cv"),tuneLength=1)贝叶斯加性回归树32个样本10个预测变量无预处理重抽样:交叉验证(10折)样本大小总结:28, 28, 28, 29, 30, 30, ...重抽样结果: RMSE R平方 MAE 2.338429 0.9581958 2.057181调整参数'num_trees'保持在50的值不变'beta'保持在1的值不变调整参数'nu'保持在2的值不变