在R中使用RWeka包应用决策树C4.5算法时遇到错误

我试图使用带有10折交叉验证的决策树C4.5算法来检测网络垃圾邮件。我的数据集在特征选择后包含8944个观测值和36个变量。

这是我的代码:

#dividing the dataset into train and testtrainRowNumbers<-createDataPartition(final1$spam,p=0.7,list=FALSE)#Create the training datasettrainData<-final1[trainRowNumbers,]#Create Test datatestData<-final1[-trainRowNumbers,]#C4.5 using 10 fold cross validationset.seed(1958)train_control<-createFolds(trainData$spam,k=10)C45Fit<-train(spam~.,method="J48",data=trainData,              tuneLength=15,              trControl=trainControl(               method="cv",indexOut = train_control ))

这是我得到的错误:

C45Fit<-train(spam~.,method="J48",data=trainData,               tuneLength=15,               trControl=trainControl(               method="cv",indexOut = train_control ))

错误在train(spam ~ ., method = “J48”, data = trainData, tuneLength = 15, : 未使用的参数 (method = “J48”, data = trainData, tuneLength = 15, trControl = trainControl(method = “cv”, indexOut = train_control))

我有几个问题:

  1. 如何解决这个错误?

  2. 如何设置tuneLength参数?

我的数据集头部:

> head(trainData)  hostid                           host      HST_4     HST_6     HST_7     HST_8     HST_9    HST_10     HST_161      0         007cleaningagent.co.uk 0.03370787 1.9791304 0.1123596 0.1516854 0.2247191 0.2977528 0.078651692      1           0800.loan-line.co.uk 1.39539347 2.4222020 0.2284069 0.2610365 0.3531670 0.4529750 0.028790794      3 102belfast.boys-brigade.org.uk 0.29729730 1.1800000 0.2162162 0.3783784 0.5135135 0.5405405 0.216216225      4  10bristol.boys-brigade.org.uk 0.28804348 1.7745267 0.1141304 0.1847826 0.2608696 0.3750000 0.081521746      5  10enfield.boys-brigade.org.uk 0.00000000 0.8468468 0.0625000 0.1875000 0.1875000 0.3125000 0.062500008      8             13thcoventry.co.uk 0.05797101 2.1113074 0.2318841 0.3091787 0.3961353 0.5507246 0.09178744      HST_17    HST_18 HST_20    HMG_29     HMG_40     HMG_41    HMG_42    AVG_50    AVG_51     AVG_55    AVG_571 0.15730337 0.2247191  0.070 0.2907760 0.02702703 0.07207207 0.1351351  32431.65  7.215054 0.02289305 0.29801712 0.05566219 0.1094050  0.075 0.0495162 0.10641628 0.17840376 0.2410016 150592.89  2.000000 0.49661240 0.11374394 0.37837838 0.4054054  0.040 0.2156130 0.03971119 0.11552347 0.1480144  16129.61  2.125000 0.12297815 0.20338775 0.13043478 0.2119565  0.075 0.0405612 0.08152174 0.13043478 0.2119565  28759.75  2.870968 0.19622331 0.06733726 0.18750000 0.2500000  0.005 0.1125400 0.02528090 0.12359551 0.1432584  70966.61  2.000000 0.03948338 0.25137558 0.14975845 0.2512077  0.095 0.1946150 0.04382470 0.10458167 0.1633466 109388.89 11.484940 0.03547817 0.1387366       AVG_58   AVG_59     AVG_61     AVG_63    AVG_65    AVG_67     STD_77     STD_79       STD_80     STD_811 0.030079101 1.888686 0.04982536 0.07119317 0.1539772 0.2237475 0.02240051 0.04634758 0.0003248904 0.076445752 0.005874481 2.423238 0.14016213 0.17484142 0.2460647 0.3279534 0.03014901 0.05352347 0.0006170884 0.094494204 0.017285860 1.657795 0.08748573 0.14192639 0.2273218 0.2815660 0.03715705 0.07385004 0.0021174754 0.157255215 0.007008439 1.656472 0.10088409 0.17370255 0.2791502 0.3839271 0.03382564 0.07695898 0.0011314215 0.142904206 0.017145414 2.284363 0.09245673 0.14045514 0.2267635 0.2907555 0.02459505 0.06418522 0.0007756064 0.165333748 0.001818059 2.300361 0.17326186 0.25910768 0.3351511 0.4479340 0.05611160 0.07531329 0.0005475770 0.15796253     STD_83      STD_84     STD_85     STD_87    STD_94   spam1 0.1219990 0.001009964 0.04043011 0.04198925 0.3400028 normal2 0.1539489 0.001734261 0.15000000 0.16000000 0.3147682 normal4 0.2027374 0.006655953 0.06437500 0.06031250 0.7100778 normal5 0.1925378 0.002708827 0.04258065 0.05290323 0.8195509 normal6 0.2223814 0.005491305 0.09125000 0.08062500 1.2953592 normal8 0.2366591 0.002588343 0.21698795 0.14774096 0.2882247 normal

sessionInfo()的输出

> sessionInfo()R version 3.4.0 (2017-04-21)Platform: x86_64-w64-mingw32/x64 (64-bit)Running under: Windows >= 8 x64 (build 9200)Matrix products: defaultlocale:[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    attached base packages:[1] grid      stats     graphics  grDevices utils     datasets  methods   base     other attached packages: [1] bindrcpp_0.2        ggthemes_3.5.0      randomForest_4.6-12 Metrics_0.1.3       RWeka_0.4-37        mlr_2.12.1          [7] ParamHelpers_1.10   rgeos_0.3-26        VIM_4.7.0           data.table_1.10.4-3 colorspace_1.3-2    mice_2.46.0        [13] RANN_2.5.1          kernlab_0.9-25      mlbench_2.1-1       caret_6.0-79        ggplot2_2.2.1       lattice_0.20-35    [19] dplyr_0.7.4        loaded via a namespace (and not attached): [1] nlme_3.1-131       lubridate_1.7.3    bit64_0.9-7        dimRed_0.1.0       httr_1.3.1         backports_1.1.2    tools_3.4.0        [8] R6_2.2.2           rpart_4.1-11       DBI_0.8            lazyeval_0.2.1     nnet_7.3-12        withr_2.1.0        sp_1.2-7          [15] tidyselect_0.2.3   mnormt_1.5-5       parallelMap_1.3    bit_1.1-12         curl_3.0           compiler_3.4.0     checkmate_1.8.5   [22] scales_0.5.0       sfsmisc_1.1-1      DEoptimR_1.0-8     lmtest_0.9-35      psych_1.7.8        robustbase_0.92-8  stringr_1.2.0     [29] foreign_0.8-67     rio_0.5.10         pkgconfig_2.0.1    RWekajars_3.9.2-1  rlang_0.2.0        readxl_1.0.0       ddalpha_1.3.1     [36] BBmisc_1.11        bindr_0.1          zoo_1.8-0          ModelMetrics_1.1.0 car_3.0-0          magrittr_1.5       Matrix_1.2-12     [43] Rcpp_0.12.14       munsell_0.4.3      abind_1.4-5        stringi_1.1.6      carData_3.0-1      MASS_7.3-47        plyr_1.8.4        [50] recipes_0.1.1      parallel_3.4.0     forcats_0.3.0      haven_1.1.1        splines_3.4.0      pillar_1.2.1       boot_1.3-19       [57] rjson_0.2.15       reshape2_1.4.2     codetools_0.2-15   stats4_3.4.0       CVST_0.2-1         glue_1.2.0         laeken_0.4.6      [64] vcd_1.4-4          foreach_1.4.3      twitteR_1.1.9      cellranger_1.1.0   gtable_0.2.0       purrr_0.2.4        tidyr_0.7.2       [71] assertthat_0.2.0   DRR_0.0.2          gower_0.1.2        openxlsx_4.0.17    prodlim_1.6.1      broom_0.4.3        e1071_1.6-8       [78] class_7.3-14       survival_2.41-3    timeDate_3042.101  RcppRoll_0.2.2     tibble_1.4.2       rJava_0.9-9        iterators_1.0.8   [85] lava_1.5.1         ipred_0.9-6       

提前感谢任何提供的建议。


回答:

我可以通过以下方式复制错误消息:

library(RWeka)library(caret)library(mlr)# Loading required package: ParamHelpers# Attaching package: ‘mlr’# The following object is masked from ‘package:caret’:#     train#dividing the dataset into train and testtrainRowNumbers <- createDataPartition(iris$Species, p = 0.7, list = FALSE)#Create the training datasettrainData <- iris[trainRowNumbers, ]#Create Test datatestData <- iris[-trainRowNumbers, ]#C4.5 using 10 fold cross validationset.seed(1958)train_control <- createFolds(trainData$Species, k = 10)C45Fit <- train(Species~., method = "J48",data = trainData,              tuneLength = 15,              trControl = trainControl(               method = "cv",indexOut = train_control ))# Error in train(Species ~ ., method = "J48", data = trainData, tuneLength = 15,  : #   unused arguments (method = "J48", data = trainData, tuneLength = 15, trControl = trainControl(method = "cv", indexOut = train_control))

注意消息The following object is masked from ‘package:caret’: train。如果你在加载caret后加载了另一个包含train函数的包(例如本例中的mlr),R默认会使用最近加载的包中的train函数。(这就是我请求sessionInfo()的原因,以便查看加载了哪些包。出于同样的原因,可复制的示例应包括你加载的包。)R不是运行caret中的train,而是运行mlr(或你加载的其他包)中的train,这会返回错误消息。

解决方案是最后加载caret,或者明确调用caret中的train函数,使用caret::train(...)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注