在R中使用RWeka包应用决策树C4.5算法时遇到错误

我试图使用带有10折交叉验证的决策树C4.5算法来检测网络垃圾邮件。我的数据集在特征选择后包含8944个观测值和36个变量。

这是我的代码:

#dividing the dataset into train and testtrainRowNumbers<-createDataPartition(final1$spam,p=0.7,list=FALSE)#Create the training datasettrainData<-final1[trainRowNumbers,]#Create Test datatestData<-final1[-trainRowNumbers,]#C4.5 using 10 fold cross validationset.seed(1958)train_control<-createFolds(trainData$spam,k=10)C45Fit<-train(spam~.,method="J48",data=trainData,              tuneLength=15,              trControl=trainControl(               method="cv",indexOut = train_control ))

这是我得到的错误:

C45Fit<-train(spam~.,method="J48",data=trainData,               tuneLength=15,               trControl=trainControl(               method="cv",indexOut = train_control ))

错误在train(spam ~ ., method = “J48”, data = trainData, tuneLength = 15, : 未使用的参数 (method = “J48”, data = trainData, tuneLength = 15, trControl = trainControl(method = “cv”, indexOut = train_control))

我有几个问题:

  1. 如何解决这个错误?

  2. 如何设置tuneLength参数?

我的数据集头部:

> head(trainData)  hostid                           host      HST_4     HST_6     HST_7     HST_8     HST_9    HST_10     HST_161      0         007cleaningagent.co.uk 0.03370787 1.9791304 0.1123596 0.1516854 0.2247191 0.2977528 0.078651692      1           0800.loan-line.co.uk 1.39539347 2.4222020 0.2284069 0.2610365 0.3531670 0.4529750 0.028790794      3 102belfast.boys-brigade.org.uk 0.29729730 1.1800000 0.2162162 0.3783784 0.5135135 0.5405405 0.216216225      4  10bristol.boys-brigade.org.uk 0.28804348 1.7745267 0.1141304 0.1847826 0.2608696 0.3750000 0.081521746      5  10enfield.boys-brigade.org.uk 0.00000000 0.8468468 0.0625000 0.1875000 0.1875000 0.3125000 0.062500008      8             13thcoventry.co.uk 0.05797101 2.1113074 0.2318841 0.3091787 0.3961353 0.5507246 0.09178744      HST_17    HST_18 HST_20    HMG_29     HMG_40     HMG_41    HMG_42    AVG_50    AVG_51     AVG_55    AVG_571 0.15730337 0.2247191  0.070 0.2907760 0.02702703 0.07207207 0.1351351  32431.65  7.215054 0.02289305 0.29801712 0.05566219 0.1094050  0.075 0.0495162 0.10641628 0.17840376 0.2410016 150592.89  2.000000 0.49661240 0.11374394 0.37837838 0.4054054  0.040 0.2156130 0.03971119 0.11552347 0.1480144  16129.61  2.125000 0.12297815 0.20338775 0.13043478 0.2119565  0.075 0.0405612 0.08152174 0.13043478 0.2119565  28759.75  2.870968 0.19622331 0.06733726 0.18750000 0.2500000  0.005 0.1125400 0.02528090 0.12359551 0.1432584  70966.61  2.000000 0.03948338 0.25137558 0.14975845 0.2512077  0.095 0.1946150 0.04382470 0.10458167 0.1633466 109388.89 11.484940 0.03547817 0.1387366       AVG_58   AVG_59     AVG_61     AVG_63    AVG_65    AVG_67     STD_77     STD_79       STD_80     STD_811 0.030079101 1.888686 0.04982536 0.07119317 0.1539772 0.2237475 0.02240051 0.04634758 0.0003248904 0.076445752 0.005874481 2.423238 0.14016213 0.17484142 0.2460647 0.3279534 0.03014901 0.05352347 0.0006170884 0.094494204 0.017285860 1.657795 0.08748573 0.14192639 0.2273218 0.2815660 0.03715705 0.07385004 0.0021174754 0.157255215 0.007008439 1.656472 0.10088409 0.17370255 0.2791502 0.3839271 0.03382564 0.07695898 0.0011314215 0.142904206 0.017145414 2.284363 0.09245673 0.14045514 0.2267635 0.2907555 0.02459505 0.06418522 0.0007756064 0.165333748 0.001818059 2.300361 0.17326186 0.25910768 0.3351511 0.4479340 0.05611160 0.07531329 0.0005475770 0.15796253     STD_83      STD_84     STD_85     STD_87    STD_94   spam1 0.1219990 0.001009964 0.04043011 0.04198925 0.3400028 normal2 0.1539489 0.001734261 0.15000000 0.16000000 0.3147682 normal4 0.2027374 0.006655953 0.06437500 0.06031250 0.7100778 normal5 0.1925378 0.002708827 0.04258065 0.05290323 0.8195509 normal6 0.2223814 0.005491305 0.09125000 0.08062500 1.2953592 normal8 0.2366591 0.002588343 0.21698795 0.14774096 0.2882247 normal

sessionInfo()的输出

> sessionInfo()R version 3.4.0 (2017-04-21)Platform: x86_64-w64-mingw32/x64 (64-bit)Running under: Windows >= 8 x64 (build 9200)Matrix products: defaultlocale:[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    attached base packages:[1] grid      stats     graphics  grDevices utils     datasets  methods   base     other attached packages: [1] bindrcpp_0.2        ggthemes_3.5.0      randomForest_4.6-12 Metrics_0.1.3       RWeka_0.4-37        mlr_2.12.1          [7] ParamHelpers_1.10   rgeos_0.3-26        VIM_4.7.0           data.table_1.10.4-3 colorspace_1.3-2    mice_2.46.0        [13] RANN_2.5.1          kernlab_0.9-25      mlbench_2.1-1       caret_6.0-79        ggplot2_2.2.1       lattice_0.20-35    [19] dplyr_0.7.4        loaded via a namespace (and not attached): [1] nlme_3.1-131       lubridate_1.7.3    bit64_0.9-7        dimRed_0.1.0       httr_1.3.1         backports_1.1.2    tools_3.4.0        [8] R6_2.2.2           rpart_4.1-11       DBI_0.8            lazyeval_0.2.1     nnet_7.3-12        withr_2.1.0        sp_1.2-7          [15] tidyselect_0.2.3   mnormt_1.5-5       parallelMap_1.3    bit_1.1-12         curl_3.0           compiler_3.4.0     checkmate_1.8.5   [22] scales_0.5.0       sfsmisc_1.1-1      DEoptimR_1.0-8     lmtest_0.9-35      psych_1.7.8        robustbase_0.92-8  stringr_1.2.0     [29] foreign_0.8-67     rio_0.5.10         pkgconfig_2.0.1    RWekajars_3.9.2-1  rlang_0.2.0        readxl_1.0.0       ddalpha_1.3.1     [36] BBmisc_1.11        bindr_0.1          zoo_1.8-0          ModelMetrics_1.1.0 car_3.0-0          magrittr_1.5       Matrix_1.2-12     [43] Rcpp_0.12.14       munsell_0.4.3      abind_1.4-5        stringi_1.1.6      carData_3.0-1      MASS_7.3-47        plyr_1.8.4        [50] recipes_0.1.1      parallel_3.4.0     forcats_0.3.0      haven_1.1.1        splines_3.4.0      pillar_1.2.1       boot_1.3-19       [57] rjson_0.2.15       reshape2_1.4.2     codetools_0.2-15   stats4_3.4.0       CVST_0.2-1         glue_1.2.0         laeken_0.4.6      [64] vcd_1.4-4          foreach_1.4.3      twitteR_1.1.9      cellranger_1.1.0   gtable_0.2.0       purrr_0.2.4        tidyr_0.7.2       [71] assertthat_0.2.0   DRR_0.0.2          gower_0.1.2        openxlsx_4.0.17    prodlim_1.6.1      broom_0.4.3        e1071_1.6-8       [78] class_7.3-14       survival_2.41-3    timeDate_3042.101  RcppRoll_0.2.2     tibble_1.4.2       rJava_0.9-9        iterators_1.0.8   [85] lava_1.5.1         ipred_0.9-6       

提前感谢任何提供的建议。


回答:

我可以通过以下方式复制错误消息:

library(RWeka)library(caret)library(mlr)# Loading required package: ParamHelpers# Attaching package: ‘mlr’# The following object is masked from ‘package:caret’:#     train#dividing the dataset into train and testtrainRowNumbers <- createDataPartition(iris$Species, p = 0.7, list = FALSE)#Create the training datasettrainData <- iris[trainRowNumbers, ]#Create Test datatestData <- iris[-trainRowNumbers, ]#C4.5 using 10 fold cross validationset.seed(1958)train_control <- createFolds(trainData$Species, k = 10)C45Fit <- train(Species~., method = "J48",data = trainData,              tuneLength = 15,              trControl = trainControl(               method = "cv",indexOut = train_control ))# Error in train(Species ~ ., method = "J48", data = trainData, tuneLength = 15,  : #   unused arguments (method = "J48", data = trainData, tuneLength = 15, trControl = trainControl(method = "cv", indexOut = train_control))

注意消息The following object is masked from ‘package:caret’: train。如果你在加载caret后加载了另一个包含train函数的包(例如本例中的mlr),R默认会使用最近加载的包中的train函数。(这就是我请求sessionInfo()的原因,以便查看加载了哪些包。出于同样的原因,可复制的示例应包括你加载的包。)R不是运行caret中的train,而是运行mlr(或你加载的其他包)中的train,这会返回错误消息。

解决方案是最后加载caret,或者明确调用caret中的train函数,使用caret::train(...)

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注