使用scikit-learn的SVM分类算法（RBF核）时出现意外结果

我参考了这个页面上的示例 http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html，使用标准差为10的正态分布数据替代了鸢尾花数据，创建了自己的图表。

我的图表结果如下所示： enter image description here

请注意，RBF核的图表与示例中的图表非常不同。除了红色和蓝色部分外，整个区域都被分类为黄色。换句话说，支持向量太多了。我尝试更改C和degree，但没有帮助。生成此图表的代码如下所示。

请注意，我需要使用RBF核，因为多项式核的运行速度明显慢于RBF核。

import numpy as npimport pylab as plfrom sklearn import svm, datasetsFP_SIZE = 50STD = 10def gen(fp):  data = []  target = []  fp_count = len(fp)  # generate rssi reading for monitors / fingerprint points  # using scikit-learn data structure  for i in range(0, fp_count):    for j in range(0,FP_SIZE):      target.append(i)      data.append(np.around(np.random.normal(fp[i],STD)))  data = np.array(data)  target = np.array(target)  return data, targetfp = [[-30,-70],[-58,-30],[-60,-60]]data, target = gen(fp)# import some data to play with# iris = datasets.load_iris()X = data[:, :2]  # we only take the first two features. We could                      # avoid this ugly slicing by using a two-dim datasetY = targeth = .02  # step size in the mesh# we create an instance of SVM and fit out data. We do not scale our# data since we want to plot the support vectorsC = 1.0  # SVM regularization parametersvc = svm.SVC(kernel='linear', C=C).fit(X, Y)rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, Y)poly_svc = svm.SVC(kernel='poly', degree=3, C=C).fit(X, Y)lin_svc = svm.LinearSVC(C=C).fit(X, Y)# create a mesh to plot inx_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, h),                     np.arange(y_min, y_max, h))# title for the plotstitles = ['SVC with linear kernel',          'SVC with RBF kernel',          'SVC with polynomial (degree 3) kernel',          'LinearSVC (linear kernel)']for i, clf in enumerate((svc, rbf_svc, poly_svc, lin_svc)):    # Plot the decision boundary. For that, we will asign a color to each    # point in the mesh [x_min, m_max]x[y_min, y_max].    pl.subplot(2, 2, i + 1)    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])    # Put the result into a color plot    Z = Z.reshape(xx.shape)    pl.contourf(xx, yy, Z, cmap=pl.cm.Paired)    pl.axis('off')    # Plot also the training points    pl.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)    pl.title(titles[i])pl.show()

回答：

除了点上的结果外，你还使用了其他正确性测量方法吗？

通常SVM需要使用网格搜索来运行，特别是如果你使用RBF核，C值只会处理正则化，如果你的数据一开始就不稀疏，这将起到很小的作用。

你需要对gamma和C进行网格搜索，他们在这里有一个很好的示例：

http://scikit-learn.org/0.13/auto_examples/grid_search_digits.html#example-grid-search-digits-py

另外，他们的库已经处理了交叉验证。

请记住，这些示例对于玩具数据集来说很好，但一旦你使用新的数据集，就没有理由相信它会像示例中的数据集那样表现。

学技术

使用scikit-learn的SVM分类算法（RBF核）时出现意外结果

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复