使用PCA组件进行分类

我对我的数据集进行了PCA分析，如下所示：

from sklearn.decomposition import PCApca = PCA(n_components=3)principalComponents = pca.fit_transform(scale_x)principalDf = pd.DataFrame(data=principalComponents, columns = ['PC1', 'PC2', 'PC3'])

然后在使用Matplotlib可视化结果时，我可以看到我的两个类别之间的分界，如下所示：

from mpl_toolkits.mplot3d import Axes3Dfig = plt.figure()ax = fig.add_subplot(111, projection='3d')ax.scatter(principalDf['PC1'].values, principalDf['PC2'].values, principalDf['PC3'].values, c=['red' if m==0 else 'green' for m in y], marker='o')ax.set_xlabel('PC1')ax.set_ylabel('PC2')ax.set_zlabel('PC3')plt.show()

但是，当我使用像SVM或Logistic Regression这样的分类模型时，它无法学习这种关系：

from sklearn.linear_model import LogisticRegressionlg = LogisticRegression(solver = 'lbfgs')lg.fit(principalDf.values, y)lg_p = lg.predict(principalDf.values)print(classification_report(y, lg_p, target_names=['Failure', 'Success']))

                 precision    recall  f1-score   support        Failure       1.00      0.03      0.06        67        Success       0.77      1.00      0.87       219       accuracy                           0.77       286      macro avg       0.89      0.51      0.46       286   weighted avg       0.82      0.77      0.68       286

这可能是什么原因？

回答：

首先，使用三个特征PC1、PC2、PC3。图中未显示的附加特征（PC4至PC6）可能会影响分类结果。

其次，分类器有时可能不如你想象的那样训练得好。我建议使用决策树来代替你使用的分类器，因为树是一种（水平）线性分类器，可能会产生你预期的结果。

学技术

使用PCA组件进行分类

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复