我是R的新手,正在尝试学习和执行R中的机器学习。
在运行caret
中的gbm
时,我遇到了这个错误:错误在{}中:任务1失败 - "输入必须是因子"
。
使用相同的参数
,它在许多其他算法上运行得很好,比如 – rf
、adaboost
等。
参考代码:
fitCtrl_2 <- trainControl( method = "cv", # repeats = 5, number = 10, savePredictions = "final", classProbs = TRUE, summaryFunction = twoClassSummary)
下面的代码会产生错误
set.seed(123)system.time(model_gbm <- train(pull(y) ~ duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, data = train, method = "gbm", # Added for gbm distribution="gaussian", # Added for gbm metric = "ROC", bag.fraction=0.75, # Added for gbm # tuneLenth = 10, trControl = fitCtrl_2))
下面的代码在相同数据上运行得很好
SVM模型
set.seed(123)system.time(model_svm <- train(pull(y) ~ duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, data = train, method = "svmRadial", tuneLenth = 10, trControl = fitCtrl_2))
我查看了其他关于这个问题的Stack Overflow帖子,但不清楚我到底需要做什么来解决这个问题。
回答:
看起来你在做分类,如果是的话,分布应该设置为”bernoulli”而不是”gaussian”,下面是一个示例:
set.seed(111)df = data.frame(matrix(rnorm(1600),ncol=16))colnames(df) = c("duration", "nr.employed", "euribor3m", "pdays", "emp.var.rate", "poutcome.success", "month.mar", "cons.conf.idx", "contact.telephone", "contact.cellular", "previous", "age", "cons.price.idx", "month.jun", "job.retired")df$y = ifelse(runif(100)>0.5,"a","b")mod = as.formula("y ~ duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired")model_gbm <- train(mod, data = df, method = "gbm", distribution="gaussian", metric = "ROC", bag.fraction=0.75, trControl = fitCtrl_2)
你会得到一个错误:
错误在{}中:任务1失败 - "输入必须是因子"
将其设置为bernoulli就会没问题:
model_gbm <- train(mod, data = df, method = "gbm", distribution="bernoulli", metric = "ROC", bag.fraction=0.75, trControl = fitCtrl_2)model_gbmStochastic Gradient Boosting 100 samples 15 predictor 2 classes: 'a', 'b' No pre-processingResampling: Cross-Validated (10 fold) Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... Resampling results across tuning parameters: interaction.depth n.trees ROC Sens Spec 1 50 0.6338333 0.7233333 0.500 1 100 0.6093333 0.6533333 0.510 1 150 0.6193333 0.6500000 0.555 2 50 0.6445000 0.6900000 0.545 2 100 0.6138333 0.6166667 0.620 2 150 0.6085000 0.6700000 0.555 3 50 0.5770000 0.6466667 0.555 3 100 0.5756667 0.6066667 0.530 3 150 0.5808333 0.6300000 0.530