这是我的代码。
library(dplyr)library(caret)library(xgboost)data(agaricus.train, package = "xgboost")data(agaricus.test, package='xgboost')train <- agaricus.traintest <- agaricus.testxgb_grid_1 <- expand.grid( nrounds = c(1:10), eta = c(seq(0,1,0.1)), max_depth = c(2:5), gamman = c(seq(0,1,0.1)))xgb_trcontrol_1 <- trainControl( method = "cv", number = 5, verboseIter = TRUE, returnData = FALSE, returnResamp = "all", classProbs = TRUE, summaryFunction = twoClassSummary, allowParallel = TRUE)xgb_train1 <- train( x = as.matrix(train$data), y = train$label, trControl = xgb_trcontrol_1, tune_grid = xgb_grid_1, method = 'xgbTree')
在编译xgb_train1时,出现了一个错误信息
Error in frankv(predicted) : x is a list, 'cols' can not be 0-lengthIn addition: Warning messages:1: In train.default(x = train$data, y = train$label, trControl = xgb_trcontrol_1, : You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column.2: In train.default(x = train$data, y = train$label, trControl = xgb_trcontrol_1, : cannnot compute class probabilities for regression
我该怎么办?请告知我
回答:
你的代码存在几个问题。
- 指定正确的参数名称
caret::train
没有tune_grid
参数,而是tuneGrid
- 你试图进行分类,但你提供的是一个数值型
target
。这就是错误信息告诉你的内容:
You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column.
- 在SO上发布最小示例时,尽量将计算时间限制在最低。在你的示例中,这可以通过减少搜索空间很容易实现。
以下是应该可以工作的代码:
library(caret)library(xgboost)data(agaricus.train, package = "xgboost")data(agaricus.test, package='xgboost')train <- agaricus.traintest <- agaricus.testtrain$label <- ifelse(train$label == 0, "no", "yes") #将目标转换为字符或因子xgb_grid_1 = expand.grid( nrounds = 100, eta = c(0.01, 0.001, 0.0001), max_depth = c(2, 4, 6, 8, 10), gamma = 1, colsample_bytree = 0.6, min_child_weight = 1, subsample = 0.75)xgb_trcontrol_1 <- trainControl( method = "cv", number = 3, search = "grid", verboseIter = TRUE, returnData = FALSE, returnResamp = "all", classProbs = TRUE, summaryFunction = twoClassSummary)xgb_train1 <- caret::train( x = as.matrix(train$data), y = train$label, trControl = xgb_trcontrol_1, tuneGrid = xgb_grid_1, metric ="ROC", method = 'xgbTree') #outputeXtreme Gradient Boosting No pre-processingResampling: Cross-Validated (3 fold) Summary of sample sizes: 4343, 4341, 4342 Resampling results across tuning parameters: eta max_depth ROC Sens Spec 1e-04 2 0.9963189 0.9780604 0.9656045 1e-04 4 0.9999604 0.9985172 0.9974527 1e-04 6 1.0000000 1.0000000 0.9974527 1e-04 8 1.0000000 1.0000000 0.9974527 1e-04 10 1.0000000 1.0000000 0.9974527 1e-03 2 0.9972687 0.9629358 0.9713391 1e-03 4 0.9999479 0.9985172 0.9974527 1e-03 6 1.0000000 1.0000000 0.9974527 1e-03 8 1.0000000 1.0000000 0.9974527 1e-03 10 1.0000000 1.0000000 0.9977714 1e-02 2 0.9990705 0.9780604 0.9757951 1e-02 4 0.9999674 1.0000000 0.9974527 1e-02 6 1.0000000 1.0000000 0.9977714 1e-02 8 1.0000000 1.0000000 0.9977714 1e-02 10 1.0000000 1.0000000 0.9977714Tuning parameter 'nrounds' was held constant at a value of 100Tuning parameter 'gamma' was held constant at a value of 1Tuning parameter 'colsample_bytree' was held constant at a value of 0.6Tuning parameter 'min_child_weight' was held constant at a value of 1Tuning parameter 'subsample' was held constant at a value of 0.75ROC was used to select the optimal model using the largest value.The final values used for the model were nrounds = 100, max_depth = 6, eta = 1e-04, gamma = 1, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.75.