我创建并调整了多个模型,但在尝试预测时遇到了问题。我首先运行以下代码来调整LDA模型。
library(MASS)library(caret)library(randomForest)data(survey)data<-survey#create training and test setsplit <- createDataPartition(data$W.Hnd, p=.8)[[1]]train<-data[split,]test<-data[-split,]#creating training parameterscontrol <- trainControl(method = "cv", number = 10, p =.8, savePredictions = TRUE, classProbs = TRUE, summaryFunction = twoClassSummary)#fitting and tuning modellda_tune <- train(W.Hnd ~ . , data=train, method = "glm" , metric = "ROC", trControl = control)
然而,当我运行 results <- predict(rf_tune, newdata=test)
时,
输出只有32行,而测试集有46行。这是个问题,因为我创建了一个包含多个模型预测值的测试结果data.frame
,以便使用混淆矩阵进行分析。例如,当我运行以下代码时
results<-data.frame(obs = test$W.Hnd, lda = predict(lda_tune, newdata = test))
我得到了错误 Error in
$<-.data.frame(
tmp, "rf_results", value = c(2L, 2L, 2L, : replacement has 32 rows, data has 46
能有人解释一下为什么caret在有明显46个值需要预测的情况下,或者当我明确调用模型预测测试集中的值时,返回了32个预测值吗?
回答:
运行你的代码时,我这边出现了错误。twoClassSummary返回了一个错误。但忽略这一点,你先是提到lda_tune,后来又提到rf_tune。
考虑到这些问题,问题出在你的测试集中有缺失值。如果你检查 nrow(test[complete.cases(test), ])
,你会发现它返回了33个案例。这正是predict返回的值。
我添加了下面的代码供参考。包括rf_tune和lda_tune及其结果。
library(MASS)library(caret)library(randomForest)data(survey)data<-survey#create training and test setsplit <- createDataPartition(data$W.Hnd, p=.8)[[1]]train<-data[split,]test<-data[-split,]#creating training parameterscontrol <- trainControl(method = "cv", number = 10, p =.8, savePredictions = TRUE, classProbs = TRUE)#fitting and tuning modellda_tune <- train(W.Hnd ~ . , data=train, method = "glm" , metric = "ROC", trControl = control)rf_tune <- train(W.Hnd ~ . , data=train, method = "rf" , metric = "ROC", trControl = control)lda_results <- data.frame(obs = test$W.Hnd[complete.cases(test)], lda = predict(lda_tune, newdata = test))rf_results <- data.frame(obs = test$W.Hnd[complete.cases(test)], lda = predict(rf_tune, newdata = test))