我正在使用h2o
包构建一个GLM模型的集成,这些模型具有不同的正则化参数(alpha, lambda)。当我按照文档尝试构建集成时:
ensemble <- h2o.stackedEnsemble(x = predictors, y = response, training_frame = train, model_id = "ensemble", base_models = list(glm_grid@model_ids))
其中glm_grid@model_ids
是从网格搜索中获取的模型,用于确定GLM的优化alpha
和lambda
正则化参数。我收到了以下错误:
When creating a StackedEnsemble you must specify one or more models; 24 were specified but none of those were found: [list("glm_grid_model_6", glm_grid_model_11, glm_grid_model_7, glm_grid_model_9, glm_grid_model_2, glm_grid_model_21, glm_grid_model_15, glm_grid_model_0"]
你知道问题出在哪里吗?
编辑:我尝试按照文档使用类似的代码:
gbm_grid <- h2o.grid(algorithm = "gbm", grid_id = "gbm_grid_binomial", x = x, y = y, training_frame = train, ntrees = 10, seed = 1, nfolds = nfolds, fold_assignment = "Modulo", keep_cross_validation_predictions = TRUE, hyper_params = hyper_params, search_criteria = search_criteria)# Train a stacked ensemble using the GBM gridensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, model_id = "ensemble_gbm_grid_binomial", base_models = gbm_grid@model_ids)
正如@Erin LeDell所建议,我移除了额外的list()
,现在它可以工作了。然而,我最终希望使用来自不同模型的网格,所以类似于这样:
ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, model_id = "my_ensemble_binomial", base_models = list(my_gbm, my_rf))
编辑2:
通过使用以下方法解决了这个问题:
model_list <- as.list(c(glm_grid_1@model_ids, glm_grid_2@model_ids))ensemble <- h2o.stackedEnsemble(x = predictors, y = response, training_frame = train, model_id = "ensemble1231", base_models = model_list)
回答:
你对glm_grid@model_ids
多加了一个list()
,这是不需要的,这可能是错误的来源。glm_grid@model_ids
对象已经是一个列表。请改为这样做:
ensemble <- h2o.stackedEnsemble(x = predictors, y = response, training_frame = train, model_id = "ensemble", base_models = glm_grid@model_ids)
有关更多信息,请查看此处的R示例这里。