我无法生成R语言中单类分类混淆矩阵

我正在尝试理解并在Kaggle上的数据集(https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)中实现R语言的单类分类。

在尝试打印混淆矩阵时,出现了以下错误:

Error in! All.equal (nrow (data), ncol (data)): invalid type argument

我做错了什么?

library(caret)library(dplyr)library(e1071)library(NLP)library(tm)library(data.table)ds = read.csv('C:/Users/hugos/Desktop/FS Dataset/Health/data_cancer.csv',               header = TRUE)mycols <- c("id","diagnosis","radius_mean","texture_mean","perimeter_mean","area_mean",                           "smoothness_mean","compactness_mean","concavity_mean",                      "concave.points_mean","symmetry_mean","fractal_dimension_mean",              "radius_se","texture_se","perimeter_se",                        "area_se","smoothness_se","compactness_se",                      "concavity_se","concave.points_se","symmetry_se",                         "fractal_dimension_se","radius_worst","texture_worst",                       "perimeter_worst","area_worst","smoothness_worst",                    "compactness_worst","concavity_worst","concave.points_worst",                "symmetry_worst","fractal_dimension_worst")#Convert to numericsetDT(ds)[, (mycols) := lapply(.SD, as.numeric), .SDcols = mycols]#Convert classification to logicaldata <- ds[,.(id,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave.points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave.points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave.points_worst,symmetry_worst,fractal_dimension_worst,diagnosis = ds$diagnosis == "TRUE")]dataclean <- na.omit(data)#Separating train and testinTrain<-createDataPartition(1:nrow(dataclean),p=0.7,list=FALSE)train<- dataclean[inTrain]test <- dataclean[-inTrain]svm.model<-svm(diagnosis ~ id+radius_mean+texture_mean+perimeter_mean+area_mean+smoothness_mean+compactness_mean+concavity_mean+concave.points_mean+symmetry_mean+fractal_dimension_mean+radius_se+texture_se+perimeter_se+area_se+smoothness_se+compactness_se+concavity_se+concave.points_se+symmetry_se+fractal_dimension_se+radius_worst+texture_worst+perimeter_worst+area_worst+smoothness_worst+compactness_worst+concavity_worst+concave.points_worst+symmetry_worst+fractal_dimension_worst, data = train,               type='one-classification',               trControl = fitControl,               nu=0.10,               scale=TRUE,               kernel="radial",               metric = "ROC")#Perform predictions svm.predtrain<-predict(svm.model,train)svm.predtest<-predict(svm.model,test)confTrain <- table(Predicted=svm.predtrain,                   Reference=train$diagnosis[as.integer(names(svm.predtrain))])confTest <- table(Predicted=svm.predtest,                  Reference=test$diagnosis[as.integer(names(svm.predtest))])confusionMatrix(confTest,positive='TRUE')print(confTrain)print(confTest)

回答:

你的问题出在这一行:

#Convert classification to logicaldata <- ds[, .(id, radius_mean, ..., diagnosis = ds$diagnosis == "TRUE")]

我假设你使用的是R 4.0版本,因为read.csv函数的默认行为现在不会将字符列转换为因子。这个命令:

#Convert to numericsetDT(ds)[, (mycols) := lapply(.SD, as.numeric), .SDcols = mycols]

会将所有诊断结果转换为NA,因为它们是”M”或”B”,分别代表恶性和良性。

因此,确保在导入数据时将字符串转换为因子。

ds = read.csv('.../data_cancer.csv', header = TRUE, stringsAsFactors = TRUE)str(ds)'data.frame':   569 obs. of  33 variables: $ id                     : int  842302 842517 84300903 84348301 84358402 843786 844359 ... $ diagnosis              : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...

我想有些人需要一段时间来适应R的新行为。你转换分类为逻辑值的命令应该改为:

data <- ds[, .(id, radius_mean, ..., diagnosis = diagnosis == 2)] # 或者 == 1 ?

这样,你的其他命令就能正常工作了。

confusionMatrix(confTest, positive='TRUE')

Confusion Matrix and Statistics         ReferencePredicted FALSE TRUE    FALSE    10    8  # Note these numbers may change    TRUE    100   50               Accuracy : 0.3571                           95% CI : (0.2848, 0.4346)    No Information Rate : 0.6548              P-Value [Acc > NIR] : 1                                 Kappa : -0.0342          Mcnemar's Test P-Value : <2e-16                      Sensitivity : 0.86207                     Specificity : 0.09091                  Pos Pred Value : 0.33333                  Neg Pred Value : 0.55556                      Prevalence : 0.34524                  Detection Rate : 0.29762            Detection Prevalence : 0.89286               Balanced Accuracy : 0.47649                'Positive' Class : TRUE

Related Posts

神经网络反向传播代码不工作

我需要编写一个简单的由1个输出节点、1个包含3个节点的…

值错误:y 包含先前未见过的标签:

我使用了 决策树分类器,我想将我的 输入 作为 字符串…

使用不平衡数据集进行特征选择时遇到的问题

我正在使用不平衡数据集(54:38:7%)进行特征选择…

广义随机森林/因果森林在Python上的应用

我在寻找Python上的广义随机森林/因果森林算法,但…

如何用PyTorch仅用标量损失来训练神经网络?

假设我们有一个神经网络,我们希望它能根据输入预测三个值…

什么是RNN中间隐藏状态的良好用途?

我已经以三种不同的方式使用了RNN/LSTM: 多对多…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注