在3个不同的分类器上使用相同的数据集输出相同的混淆矩阵/准确率得分

我遇到了一个问题，3个不同的分类器都在同一个数据集（sklearn的iris数据集）上进行训练，输出的准确率得分和混淆矩阵完全相同。我给我的教授发了邮件，询问这是否是正常现象，以及如果不是的话她有什么建议，她的回答基本上是“这是不正常的，回去检查你的代码”。

自那以后，我已经对我的代码进行了相当多的检查，但我似乎看不出哪里出了问题。我希望这里的某个人能为我解惑，让我能从这次经历中学到一些东西。

这是我的代码：

# Datasetfrom sklearn import datasets# Data Preprocessingfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScaler# Classifiersfrom sklearn.svm import SVCfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.linear_model import LogisticRegression# Performance Metricsfrom sklearn.metrics import confusion_matrix, accuracy_scoreif __name__ == '__main__':    # Read dataset into memory.    iris = datasets.load_iris()    # Extract independent and dependent variables into variables.    X = iris.data    y = iris.target    # Split training and test sets (70/30).    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)    # Fit the scaler to the training set, and transform both the training and test sets dependent    # columns, which are all of them since none of the dependent variables contain categorical data.    ss = StandardScaler()    X_train = ss.fit_transform(X_train)    X_test = ss.transform(X_test)    # Create the classifiers.    dt_classifier = DecisionTreeClassifier(random_state=0)    svm_classifier = SVC(kernel='rbf', random_state=0)    lr_classifier = LogisticRegression(random_state=0)    # Fit the classifiers to the training data.    dt_classifier.fit(X_train, y_train)    svm_classifier.fit(X_train, y_train)    lr_classifier.fit(X_train, y_train)    # Predict using the now trained classifiers.    dt_y_pred = dt_classifier.predict(X_test)    svm_y_pred = svm_classifier.predict(X_test)    lr_y_pred = lr_classifier.predict(X_test)    # Create confusion matrices using the predicted results and the actual results from the test set.    dt_cm = confusion_matrix(y_test, dt_y_pred)    svm_cm = confusion_matrix(y_test, svm_y_pred)    lr_cm = confusion_matrix(y_test, lr_y_pred)    # Calculate accuracy scores using the predicted results and the actual results from the test set.    dt_score = accuracy_score(y_test, dt_y_pred)    svm_score = accuracy_score(y_test, svm_y_pred)    lr_score = accuracy_score(y_test, lr_y_pred)    # Print confusion matrices and accuracy scores for each classifier.    print('--- Decision Tree Classifier ---')    print(f'Confusion Matrix:\n{dt_cm}')    print(f'Accuracy Score:{dt_score}\n')    print('--- Support Vector Machine Classifier ---')    print(f'Confusion Matrix:\n{svm_cm}')    print(f'Accuracy Score:{svm_score}\n')    print('--- Logistic Regression Classifier ---')    print(f'Confusion Matrix:\n{lr_cm}')    print(f'Accuracy Score:{lr_score}')

输出如下：

--- Decision Tree Classifier ---Confusion Matrix:[[16  0  0] [ 0 17  1] [ 0  0 11]]Accuracy Score:0.9777777777777777--- Support Vector Machine Classifier ---Confusion Matrix:[[16  0  0] [ 0 17  1] [ 0  0 11]]Accuracy Score:0.9777777777777777--- Logistic Regression Classifier ---Confusion Matrix:[[16  0  0] [ 0 17  1] [ 0  0 11]]Accuracy Score:0.9777777777777777

如您所见，每个不同分类器的输出完全相同。任何形式的帮助都将不胜感激。

回答：

你的代码没有任何问题。

在以下情况下，结果的相似性并不令人意外：

数据相对“简单”
样本量太小

这两个前提在这里都成立。iris数据集众所周知对于现代机器学习算法（包括你在这里使用的算法）来说非常容易分类；再加上你的测试集规模极小（仅45个样本），这样的结果并不令人惊讶。

事实上，只需将数据分割更改为使用test_size=0.20，你将从所有3个模型中获得完美的1.0准确率。

无需担心。

学技术

在3个不同的分类器上使用相同的数据集输出相同的混淆矩阵/准确率得分

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复