Home IT技术预测训练数据在sklearn中

预测训练数据在sklearn中

IT技术 xiaolong · 2025年4月12日 · 0 Comment

我使用scikit-learn的SVM如下所示：

clf = svm.SVC()clf.fit(td_X, td_y)

当我使用分类器预测训练集中的某个成员的类别时，即使在scikit-learn的实现中，分类器也可能出错吗（例如，clf.predict(td_X[a])==td_Y[a]）？

回答：

是的，绝对有可能，例如运行以下代码：

from sklearn import svmimport numpy as npclf = svm.SVC()np.random.seed(seed=42)x=np.random.normal(loc=0.0, scale=1.0, size=[100,2])y=np.random.randint(2,size=100)clf.fit(x,y)print(clf.score(x,y))

得分是0.61，因此近40%的训练数据被错误分类。部分原因是，尽管默认的核函数是'rbf'（理论上应该能够完美分类任何训练数据集，只要你没有两个相同的训练点具有不同的标签），但也有正则化来减少过拟合。默认的正则化参数是C=1.0。

如果你运行上述相同的代码，但将clf = svm.SVC()改为clf = svm.SVC(C=200000)，你将得到0.94的准确率。

libsvm machine-learning python scikit-learn

发表回复取消回复