我正在使用这个例子来从SVM分类结果中创建ROC曲线图: http://scikit-learn.org/0.13/auto_examples/plot_roc.html
然而,每个数据点实际上由4个长度为d的特征向量组成,这些向量通过一个不符合特定K(X, X)范式的自定义内核函数结合。因此,我必须向scikit-learn提供一个预计算的内核来进行分类。它看起来像这样:
K = numpy.zeros(shape = (n, n))# w1 + w2 + w3 + w4 = 1.0# v1: array, shape (n, d)# w1: float in [0, 1)chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1)mu = 1.0 / numpy.mean(chi)K += w1 * numpy.exp(-mu * chi)# v2: array, shape (n, d)# w2: float in [0, 1)chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2)mu = 1.0 / numpy.mean(chi)K += w2 * numpy.exp(-mu * chi)# v3: array, shape (n, d)# w3: float in [0, 1)chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3)mu = 1.0 / numpy.mean(chi)K += w3 * numpy.exp(-mu * chi)# v4: array, shape (n, d)# w4: float in [0, 1)chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4)mu = 1.0 / numpy.mean(chi)K += w4 * numpy.exp(-mu * chi)return K
生成ROC曲线图的主要障碍(如上链接所示)似乎是将数据分成两组,然后在测试集上调用predict_proba()
的过程。在scikit-learn中使用预计算内核是否可以做到这一点?
回答:
简短的回答是“可能不行”。你有没有尝试过像下面这样的方法?
基于http://scikit-learn.org/stable/modules/svm.html的例子,你需要像这样的东西:
import numpy as np from sklearn import svm X = np.array([[0, 0], [1, 1]]) y = [0, 1] clf = svm.SVC(kernel='precomputed') # 内核计算 K = numpy.zeros(shape = (n, n)) # "目前,训练向量与测试向量之间的所有内核值必须提供。" # 根据scikit learn网页。 # -- 这就是问题所在! # v1: array, shape (n, d) # w1: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1) mu = 1.0 / numpy.mean(chi) K += w1 * numpy.exp(-mu * chi) # v2: array, shape (n, d) # w2: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2) mu = 1.0 / numpy.mean(chi) K += w2 * numpy.exp(-mu * chi) # v3: array, shape (n, d) # w3: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3) mu = 1.0 / numpy.mean(chi) K += w3 * numpy.exp(-mu * chi) # v4: array, shape (n, d) # w4: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4) mu = 1.0 / numpy.mean(chi) K += w4 * numpy.exp(-mu * chi) # scikit-learn是LIBSVM的包装器,查看LIBSVM的自述文件 # 似乎你需要为测试数据提供内核值,像这样: Kt = numpy.zeros(shape = (nt, n)) # t1: array, shape (nt, d) # w1: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t1, v1) mu = 1.0 / numpy.mean(chi) Kt += w1 * numpy.exp(-mu * chi) # v2: array, shape (n, d) # w2: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t2, v2) mu = 1.0 / numpy.mean(chi) Kt += w2 * numpy.exp(-mu * chi) # v3: array, shape (n, d) # w3: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t3, v3) mu = 1.0 / numpy.mean(chi) Kt += w3 * numpy.exp(-mu * chi) # v4: array, shape (n, d) # w4: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t4, v4) mu = 1.0 / numpy.mean(chi) Kt += w4 * numpy.exp(-mu * chi) clf.fit(K, y) # 在测试样本上进行预测 probas_ = clf.predict_proba(Kt)
从这里开始,只需复制http://scikit-learn.org/0.13/auto_examples/plot_roc.html的底部内容