R – Caret train() “错误:停止” 与 “在newdata中未找到对象中使用的所有变量名”

我正在尝试为蘑菇数据构建一个简单的朴素贝叶斯分类器。我想使用所有变量作为分类预测变量来预测蘑菇是否可食用。

我使用的是caret包。

以下是我的完整代码:

################################################################################### 准备R和R Studio环境################################################################################### 清除R studio控制台cat("\014")# 从环境中删除对象rm(list = ls())# 安装并加载必要的包if (!require(tidyverse)) {  install.packages("tidyverse")  library(tidyverse)}if (!require(caret)) {  install.packages("caret")  library(caret)}if (!require(klaR)) {  install.packages("klaR")  library(klaR)}#################################mushrooms <- read.csv("agaricus-lepiota.data", stringsAsFactors = TRUE, header = FALSE)na.omit(mushrooms)names(mushrooms) <- c("edibility", "capShape", "capSurface", "cap-color", "bruises", "odor", "gill-attachment", "gill-spacing", "gill-size", "gill-color", "stalk-shape", "stalk-root", "stalk-surface-above-ring", "stalk-surface-below-ring", "stalk-color-above-ring", "stalk-color-below-ring", "veil-type", "veil-color", "ring-number", "ring-type", "spore-print-color", "population", "habitat")# 将bruises转换为逻辑变量mushrooms$bruises <- mushrooms$bruises == 't'set.seed(1234)split <- createDataPartition(mushrooms$edibility, p = 0.8, list = FALSE)train <- mushrooms[split, ]test <- mushrooms[-split, ]predictors <- names(train)[2:20] #创建响应和预测数据x <- train[,predictors] #预测变量y <- train$edibility #响应train_control <- trainControl(method = "cv", number = 1) # 设置1折交叉验证edibility_mod1 <- train( #训练模型  x = x,  y = y,  method = "nb",   trControl = train_control)

在执行train()函数时,我得到了以下输出:

出了点问题;所有准确率指标值都丢失了:    Accuracy       Kappa     Min.   : NA   Min.   : NA   1st Qu.: NA   1st Qu.: NA   Median : NA   Median : NA   Mean   :NaN   Mean   :NaN   3rd Qu.: NA   3rd Qu.: NA   Max.   : NA   Max.   : NA   NA's   :2     NA's   :2    错误:停止此外:警告消息:1:Fold1的预测失败:usekernel= TRUE, fL=0, adjust=1 在predict.NaiveBayes(modelFit, newdata)中出错:   在newdata中未找到对象中使用的所有变量名 2:Fold1的模型拟合失败:usekernel=FALSE, fL=0, adjust=1 在x[, 2]中出错:下标越界 3:在nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,中:  重新采样的性能度量中有缺失值。

脚本运行后的x和y:

> str(x)'data.frame':   6500 obs. of  19 variables: $ capShape                : Factor w/ 6 levels "b","c","f","k",..: 6 6 1 6 6 6 1 1 6 1 ... $ capSurface              : Factor w/ 4 levels "f","g","s","y": 3 3 3 4 3 4 3 4 4 3 ... $ cap-color               : Factor w/ 10 levels "b","c","e","g",..: 5 10 9 9 4 10 9 9 9 10 ... $ bruises                 : logi  TRUE TRUE TRUE TRUE FALSE TRUE ... $ odor                    : Factor w/ 9 levels "a","c","f","l",..: 7 1 4 7 6 1 1 4 7 1 ... $ gill-attachment         : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ... $ gill-spacing            : Factor w/ 2 levels "c","w": 1 1 1 1 2 1 1 1 1 1 ... $ gill-size               : Factor w/ 2 levels "b","n": 2 1 1 2 1 1 1 1 2 1 ... $ gill-color              : Factor w/ 12 levels "b","e","g","h",..: 5 5 6 6 5 6 3 6 8 3 ... $ stalk-shape             : Factor w/ 2 levels "e","t": 1 1 1 1 2 1 1 1 1 1 ... $ stalk-root              : Factor w/ 5 levels "?","b","c","e",..: 4 3 3 4 4 3 3 3 4 3 ... $ stalk-surface-above-ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ... $ stalk-surface-below-ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ... $ stalk-color-above-ring  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ... $ stalk-color-below-ring  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ... $ veil-type               : Factor w/ 1 level "p": 1 1 1 1 1 1 1 1 1 1 ... $ veil-color              : Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ... $ ring-number             : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ... $ ring-type               : Factor w/ 5 levels "e","f","l","n",..: 5 5 5 5 1 5 5 5 5 5 ...> str(y) Factor w/ 2 levels "e","p": 2 1 1 2 1 1 1 1 2 1 ...

我的环境是:

> R.version               _                           platform       x86_64-apple-darwin17.0     arch           x86_64                      os             darwin17.0                  system         x86_64, darwin17.0          status                                     major          4                           minor          0.3                         year           2020                        month          10                          day            10                          svn rev        79318                       language       R                           version.string R version 4.0.3 (2020-10-10)nickname       Bunny-Wunnies Freak Out     > RStudio.Version()$citationTo cite RStudio in publications use:  RStudio Team (2020). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.A BibTeX entry for LaTeX users is  @Manual{,    title = {RStudio: Integrated Development Environment for R},    author = {{RStudio Team}},    organization = {RStudio, PBC},    address = {Boston, MA},    year = {2020},    url = {http://www.rstudio.com/},  }$mode[1] "desktop"$version[1] ‘1.3.1093’$release_name[1] "Apricot Nasturtium"

回答:

Related Posts

多维度Top-k评分

例如,在机器学习中的自然语言处理中,通常使用波束搜索来…

创建训练和测试数据集分割,数据嵌套在多个文件夹中

我正在准备数据以训练一个图像识别模型。目前我有一个文件…

我的神经网络预测出现错误:IndexError: list index out of range

我正在进行一个简单的垃圾邮件/非垃圾邮件文本分类。我的…

python 给出数组是一维的,但索引了两个维度错误

我已经为 miniBatchGradientDesce…

TensorFlow自定义训练步骤使用不同的损失函数

背景 根据TensorFlow文档,可以使用以下方式执…

为什么 `np.mean(x.flatten()==y)` 的结果与 `np.mean(x==y)` 的结果不同?

当我在做机器学习实验并计算准确率时,我发现了一些奇怪的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注