我从Brett Lantz的教科书《用R进行机器学习》中复制了以下代码,并将其完全相同地复制到控制台中,
> library(caret)Loading required package: latticeLoading required package: ggplot2> library(kernlab)Attaching package: ‘kernlab’The following object is masked from ‘package:ggplot2’:alpha> set.seed(300)> ctrl <- trainControl(method = "cv", number = 10)> bagctrl <- bagControl(fit = svmBag$fit, predict = svmBag$pred, aggregate = svmBag$aggregate)> setwd("~/2148OS_code/chapter 11")> credit <- read.csv("credit.csv")> svmbag <- train(default ~ ., data = credit, "bag", trControl = ctrl, bagControl = bagctrl)
我得到了这样的回应。出了什么问题?
出了问题;所有准确率指标值都丢失了: Accuracy Kappa Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Mean :NaN Mean :NaN 3rd Qu.: NA 3rd Qu.: NA Max. : NA Max. : NA NA's :1 NA's :1 Error in train.default(x, y, weights = w, ...) : StoppingIn addition: There were 50 or more warnings (use warnings() to see the first 50)
警告信息如下
> warnings()Warning messages:1: In data.row.names(row.names, rowsi, i) : some row.names duplicated: 3,6,10,13,17,19,23,24,26,27,30,32,34,36,38,41,42,45,49,54,59,60,61,64,66,69,71,72,77,80,81,90,95,102,103,106,112,114,117,118,122,125,127,132,133,137,139,141,143,146,148,151,152,155,158,161,174,176,178,181,185,187,188,189,191,194,203,208,210,212,215,216,218,219,221,223,225,229,230,235,236,239,245,246,262,266,269,271,272,276,279,282,283,285,286,287,288,296,299,305,308,309,313,314,315,317,318,319,322,323,327,328,330,332,333,335,336,338,339,343,347,349,350,352,354,358,360,361,363,366,367,368,369,371,377,379,387,389,392,394,396,397,399,400,410,412,413,414,421,425,428,437,438,441,443,445,446,448,451,453,461,467,469,471,479,481,482,484,486,487,489,491,493,503,504,506,508,511,512,514,517,519,521,522,524,529,530,532,534,537,538,545,547,550,552,555,562,570,579,582,584,588,589,590,601,606,608,610,611,614,615,618,619,623,627,628,629,630,632,634,636,638,641,642,645,653,656,659,660,661,663,667,669,672,673,676,679,681,686,687,690,693,700,701,702,707,708,721,722,724,725,728, [... truncated]2: In data.row.names(row.names, rowsi, i) : some row.names duplicated: 3,5,8,9,13,15,18,21,25,27,29,33,36,37,41,44,45,51,52,53,55,59,60,64,66,67,72,76,77,80,91,92,96,97,102,103,104,107,110,111,113,116,119,121,122,123,127,130,133,136,139,140,143,145,147,148,149,154,158,160,164,166,168,169,171,174,176,177,178,180,182,185,187,195,199,203,205,216,218,220,223,226,231,234,236,237,238,242,245,2
回答:
我使用了第二版中提供的代码。
如果你设置并行处理,这些警告信息将会消失。但是,你仍然会遇到准确率指标丢失的错误。
这个错误是由重抽样性能度量中的缺失值引起的。如果在某个重抽样中,某个结果类别(在本例中是违约)没有样本,那么敏感性或特异性将无法定义,这种情况可能会发生。
我还使用caret包中包含的GermanCredit数据进行了测试,结果也产生了同样的错误。