大家好。
首先,数据样本在这里:
> str(train)'data.frame': 30226 obs. of 71 variables: $ sal : int 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ... $ avg : num 2392 2474 2392 2561 2763 ... $ med : num 2314 2346 2314 2535 2754 ... $ jt_category_1 : int 1 1 1 1 1 1 1 1 1 1 ... $ jt_category_2 : int 0 0 0 0 0 0 0 0 0 0 ... $ job_num_1 : int 0 0 0 0 0 0 0 0 0 0 ... $ job_num_2 : int 0 0 0 0 0 0 0 0 0 0 ...and more 64 variables(type of all is int, 0 or 1 binary values)
列“sal”是标签,它是测试数据(原始数据的70%)
我在R中使用“caret”包进行回归,并选择方法“xgbTree”。我知道它适用于分类和回归。
问题是,我想进行回归…但我不知道该怎么做
我执行了完整的代码,错误是
Error: Metric RMSE not applicable for classification models
但我并不是在尝试进行分类。我想进行回归。
我的标签(train函数的y)的类型是int
,数据类型也检查过了。
这是错的吗?这使得caret认为这是分类训练?
> str(train$sal) int [1:30226] 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...> str(train_xg)Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:181356] 0 1 2 3 4 5 6 7 8 9 ... ..@ p : int [1:71] 0 30226 60452 90504 90678 90709 90962 93875 95087 96190 ... ..@ Dim : int [1:2] 30226 70 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : chr [1:70] "avg" "med" "jt_category_1" "jt_category_2" ... ..@ x : num [1:181356] 2392 2474 2392 2561 2763 ... ..@ factors : list()
为什么会误认为是分类?
你知道如何用xgboost和caret进行回归吗?
提前感谢,
完整代码在这里:
library(caret)library(xgboost)xgb_grid_1 = expand.grid( nrounds = 1000, max_depth = c(2, 4, 6, 8, 10), eta=c(0.5, 0.1, 0.07), gamma = 0.01, colsample_bytree=0.5, min_child_weight=1, subsample=0.5)xgb_trcontrol_1 = trainControl( method = "cv", number = 5, verboseIter = TRUE, returnData = FALSE, returnResamp = "all", # save losses across all models classProbs = TRUE, # set to TRUE for AUC to be computed summaryFunction = twoClassSummary, allowParallel = TRUE) xgb_train_1 = train( x = as.matrix(train[ , 2:71]), y = as.matrix(train$sal), trControl = xgb_trcontrol_1, tuneGrid = xgb_grid_1, method = "xgbTree")
更新(18.08.10)
当我删除trainControl
函数的两个参数(classProbs = TRUE, summaryFunction = twoClassSummary
)时,结果还是一样的…:
> xgb_grid_1 = expand.grid(+ nrounds = 1000,+ max_depth = c(2, 4, 6, 8, 10),+ eta=c(0.5, 0.1, 0.07),+ gamma = 0.01,+ colsample_bytree=0.5,+ min_child_weight=1,+ subsample=0.5+ )> > xgb_trcontrol_1 = trainControl(+ method = "cv",+ number = 5,+ allowParallel = TRUE+ )> > xgb_train_1 = train(+ x = as.matrix(train[ , 2:71]),+ y = as.matrix(train$sal),+ trControl = xgb_trcontrol_1,+ tuneGrid = xgb_grid_1,+ method = "xgbTree"+ )Error: Metric RMSE not applicable for classification models
回答:
caret认为你在请求分类并不奇怪,因为你在trainControl
函数的这两行中实际上是在这样做的:
classProbs = TRUE, summaryFunction = twoClassSummary
删除这两行(这样它们会采用默认值 – 请参阅函数文档),你应该没问题了。
还要注意,AUC只适用于分类问题。
更新(评论后):似乎目标变量是整数导致了问题;在运行模型前将其转换为双精度数:
train$sal <- as.double(train$sal)