我正在尝试为蘑菇数据构建一个简单的朴素贝叶斯分类器。我想使用所有变量作为分类预测变量来预测蘑菇是否可食用。
我使用的是caret包。
以下是我的完整代码:
################################################################################### 准备R和R Studio环境################################################################################### 清除R studio控制台cat("\014")# 从环境中删除对象rm(list = ls())# 安装并加载必要的包if (!require(tidyverse)) { install.packages("tidyverse") library(tidyverse)}if (!require(caret)) { install.packages("caret") library(caret)}if (!require(klaR)) { install.packages("klaR") library(klaR)}#################################mushrooms <- read.csv("agaricus-lepiota.data", stringsAsFactors = TRUE, header = FALSE)na.omit(mushrooms)names(mushrooms) <- c("edibility", "capShape", "capSurface", "cap-color", "bruises", "odor", "gill-attachment", "gill-spacing", "gill-size", "gill-color", "stalk-shape", "stalk-root", "stalk-surface-above-ring", "stalk-surface-below-ring", "stalk-color-above-ring", "stalk-color-below-ring", "veil-type", "veil-color", "ring-number", "ring-type", "spore-print-color", "population", "habitat")# 将bruises转换为逻辑变量mushrooms$bruises <- mushrooms$bruises == 't'set.seed(1234)split <- createDataPartition(mushrooms$edibility, p = 0.8, list = FALSE)train <- mushrooms[split, ]test <- mushrooms[-split, ]predictors <- names(train)[2:20] #创建响应和预测数据x <- train[,predictors] #预测变量y <- train$edibility #响应train_control <- trainControl(method = "cv", number = 1) # 设置1折交叉验证edibility_mod1 <- train( #训练模型 x = x, y = y, method = "nb", trControl = train_control)
在执行train()函数时,我得到了以下输出:
出了点问题;所有准确率指标值都丢失了: Accuracy Kappa Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Mean :NaN Mean :NaN 3rd Qu.: NA 3rd Qu.: NA Max. : NA Max. : NA NA's :2 NA's :2 错误:停止此外:警告消息:1:Fold1的预测失败:usekernel= TRUE, fL=0, adjust=1 在predict.NaiveBayes(modelFit, newdata)中出错: 在newdata中未找到对象中使用的所有变量名 2:Fold1的模型拟合失败:usekernel=FALSE, fL=0, adjust=1 在x[, 2]中出错:下标越界 3:在nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,中: 重新采样的性能度量中有缺失值。
脚本运行后的x和y:
> str(x)'data.frame': 6500 obs. of 19 variables: $ capShape : Factor w/ 6 levels "b","c","f","k",..: 6 6 1 6 6 6 1 1 6 1 ... $ capSurface : Factor w/ 4 levels "f","g","s","y": 3 3 3 4 3 4 3 4 4 3 ... $ cap-color : Factor w/ 10 levels "b","c","e","g",..: 5 10 9 9 4 10 9 9 9 10 ... $ bruises : logi TRUE TRUE TRUE TRUE FALSE TRUE ... $ odor : Factor w/ 9 levels "a","c","f","l",..: 7 1 4 7 6 1 1 4 7 1 ... $ gill-attachment : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ... $ gill-spacing : Factor w/ 2 levels "c","w": 1 1 1 1 2 1 1 1 1 1 ... $ gill-size : Factor w/ 2 levels "b","n": 2 1 1 2 1 1 1 1 2 1 ... $ gill-color : Factor w/ 12 levels "b","e","g","h",..: 5 5 6 6 5 6 3 6 8 3 ... $ stalk-shape : Factor w/ 2 levels "e","t": 1 1 1 1 2 1 1 1 1 1 ... $ stalk-root : Factor w/ 5 levels "?","b","c","e",..: 4 3 3 4 4 3 3 3 4 3 ... $ stalk-surface-above-ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ... $ stalk-surface-below-ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ... $ stalk-color-above-ring : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ... $ stalk-color-below-ring : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ... $ veil-type : Factor w/ 1 level "p": 1 1 1 1 1 1 1 1 1 1 ... $ veil-color : Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ... $ ring-number : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ... $ ring-type : Factor w/ 5 levels "e","f","l","n",..: 5 5 5 5 1 5 5 5 5 5 ...> str(y) Factor w/ 2 levels "e","p": 2 1 1 2 1 1 1 1 2 1 ...
我的环境是:
> R.version _ platform x86_64-apple-darwin17.0 arch x86_64 os darwin17.0 system x86_64, darwin17.0 status major 4 minor 0.3 year 2020 month 10 day 10 svn rev 79318 language R version.string R version 4.0.3 (2020-10-10)nickname Bunny-Wunnies Freak Out > RStudio.Version()$citationTo cite RStudio in publications use: RStudio Team (2020). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.A BibTeX entry for LaTeX users is @Manual{, title = {RStudio: Integrated Development Environment for R}, author = {{RStudio Team}}, organization = {RStudio, PBC}, address = {Boston, MA}, year = {2020}, url = {http://www.rstudio.com/}, }$mode[1] "desktop"$version[1] ‘1.3.1093’$release_name[1] "Apricot Nasturtium"
回答: