在agaricus数据集上进行网格搜索时遇到问题

这是我的代码。

library(dplyr)library(caret)library(xgboost)data(agaricus.train, package = "xgboost")data(agaricus.test, package='xgboost')train <- agaricus.traintest  <- agaricus.testxgb_grid_1 <- expand.grid(  nrounds = c(1:10),  eta = c(seq(0,1,0.1)),  max_depth = c(2:5),  gamman = c(seq(0,1,0.1)))xgb_trcontrol_1 <- trainControl(  method = "cv",  number = 5,  verboseIter = TRUE,  returnData = FALSE,  returnResamp = "all",                                                          classProbs = TRUE,                                                             summaryFunction = twoClassSummary,  allowParallel = TRUE)xgb_train1 <- train(  x = as.matrix(train$data),  y = train$label,  trControl = xgb_trcontrol_1,  tune_grid = xgb_grid_1,  method = 'xgbTree')  

在编译xgb_train1时,出现了一个错误信息

Error in frankv(predicted) : x is a list, 'cols' can not be 0-lengthIn addition: Warning messages:1: In train.default(x = train$data, y = train$label, trControl = xgb_trcontrol_1,  :  You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column.2: In train.default(x = train$data, y = train$label, trControl = xgb_trcontrol_1,  :  cannnot compute class probabilities for regression

我该怎么办?请告知我


回答:

你的代码存在几个问题。

  1. 指定正确的参数名称

caret::train没有tune_grid参数,而是tuneGrid

  1. 你试图进行分类,但你提供的是一个数值型target。这就是错误信息告诉你的内容:

You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column.

  1. 在SO上发布最小示例时,尽量将计算时间限制在最低。在你的示例中,这可以通过减少搜索空间很容易实现。

以下是应该可以工作的代码:

library(caret)library(xgboost)data(agaricus.train, package = "xgboost")data(agaricus.test, package='xgboost')train <- agaricus.traintest  <- agaricus.testtrain$label <- ifelse(train$label == 0, "no", "yes") #将目标转换为字符或因子xgb_grid_1 = expand.grid(  nrounds = 100,  eta = c(0.01, 0.001, 0.0001),  max_depth = c(2, 4, 6, 8, 10),  gamma = 1,  colsample_bytree = 0.6,  min_child_weight = 1,  subsample = 0.75)xgb_trcontrol_1 <- trainControl(  method = "cv",  number = 3,  search = "grid",  verboseIter = TRUE,  returnData = FALSE,  returnResamp = "all",                                                          classProbs = TRUE,                                                             summaryFunction = twoClassSummary)xgb_train1 <- caret::train(  x = as.matrix(train$data),  y = train$label,  trControl = xgb_trcontrol_1,  tuneGrid  = xgb_grid_1,  metric ="ROC",  method = 'xgbTree')  #outputeXtreme Gradient Boosting No pre-processingResampling: Cross-Validated (3 fold) Summary of sample sizes: 4343, 4341, 4342 Resampling results across tuning parameters:  eta    max_depth  ROC        Sens       Spec       1e-04   2         0.9963189  0.9780604  0.9656045  1e-04   4         0.9999604  0.9985172  0.9974527  1e-04   6         1.0000000  1.0000000  0.9974527  1e-04   8         1.0000000  1.0000000  0.9974527  1e-04  10         1.0000000  1.0000000  0.9974527  1e-03   2         0.9972687  0.9629358  0.9713391  1e-03   4         0.9999479  0.9985172  0.9974527  1e-03   6         1.0000000  1.0000000  0.9974527  1e-03   8         1.0000000  1.0000000  0.9974527  1e-03  10         1.0000000  1.0000000  0.9977714  1e-02   2         0.9990705  0.9780604  0.9757951  1e-02   4         0.9999674  1.0000000  0.9974527  1e-02   6         1.0000000  1.0000000  0.9977714  1e-02   8         1.0000000  1.0000000  0.9977714  1e-02  10         1.0000000  1.0000000  0.9977714Tuning parameter 'nrounds' was held constant at a value of 100Tuning parameter 'gamma' was held constant at a value of 1Tuning parameter 'colsample_bytree' was held constant at a value of 0.6Tuning parameter 'min_child_weight' was held constant at a value of 1Tuning parameter 'subsample' was held constant at a value of 0.75ROC was used to select the optimal model using the largest value.The final values used for the model were nrounds = 100, max_depth = 6, eta = 1e-04, gamma = 1, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.75.

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注