用户定义的SVM核函数与scikit-learn

我在scikit-learn中自己定义核函数时遇到了问题。我定义了高斯核函数,能够拟合SVM,但无法用它进行预测。

更具体地说,我的代码如下:

from sklearn.datasets import load_digitsfrom sklearn.svm import SVCfrom sklearn.utils import shuffleimport scipy.sparse as sparseimport numpy as npdigits = load_digits(2)X, y = shuffle(digits.data, digits.target)gamma = 1.0X_train, X_test = X[:100, :], X[100:, :]y_train, y_test = y[:100], y[100:]m1 = SVC(kernel='rbf',gamma=1)m1.fit(X_train, y_train)m1.predict(X_test)def my_kernel(x,y):    d = x - y    c = np.dot(d,d.T)    return np.exp(-gamma*c)m2 = SVC(kernel=my_kernel)m2.fit(X_train, y_train)m2.predict(X_test)

m1和m2应该是一样的,但m2.predict(X_test)返回了以下错误:

operands could not be broadcast together with shapes (260,64) (100,64)

我不明白问题出在哪里。

此外,如果x是一个数据点,m1.predict(x)会给出+1/-1的结果,正如预期的那样,但m2.predict(x)却返回一个+1/-1的数组…我不知道为什么会这样。


回答:

错误出在x - y这行。你不能这样直接相减,因为两者的第一维可能不相等。以下是scikit-learn中rbf核函数的实现方式,摘自这里(仅保留了关键部分):

def row_norms(X, squared=False):    if issparse(X):        norms = csr_row_norms(X)    else:        norms = np.einsum('ij,ij->i', X, X)    if not squared:        np.sqrt(norms, norms)    return normsdef euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False):   """    Considering the rows of X (and Y=X) as vectors, compute the    distance matrix between each pair of vectors.    [...]    Returns    -------    distances : {array, sparse matrix}, shape (n_samples_1, n_samples_2)   """    X, Y = check_pairwise_arrays(X, Y)    if Y_norm_squared is not None:        YY = check_array(Y_norm_squared)        if YY.shape != (1, Y.shape[0]):            raise ValueError(                "Incompatible dimensions for Y and Y_norm_squared")    else:        YY = row_norms(Y, squared=True)[np.newaxis, :]    if X is Y:  # shortcut in the common case euclidean_distances(X, X)        XX = YY.T    else:        XX = row_norms(X, squared=True)[:, np.newaxis]    distances = safe_sparse_dot(X, Y.T, dense_output=True)    distances *= -2    distances += XX    distances += YY    np.maximum(distances, 0, out=distances)    if X is Y:        # Ensure that distances between vectors and themselves are set to 0.0.        # This may not be the case due to floating point rounding errors.        distances.flat[::distances.shape[0] + 1] = 0.0    return distances if squared else np.sqrt(distances, out=distances)def rbf_kernel(X, Y=None, gamma=None):    X, Y = check_pairwise_arrays(X, Y)    if gamma is None:        gamma = 1.0 / X.shape[1]    K = euclidean_distances(X, Y, squared=True)    K *= -gamma    np.exp(K, K)    # exponentiate K in-place    return K

你可能需要深入研究代码,但请查看euclidean_distances函数的注释。你尝试实现的功能的一个简单实现可能是这样的:

def my_kernel(x,y):    d = np.zeros((x.shape[0], y.shape[0]))    for i, row_x in enumerate(x):        for j, row_y in enumerate(y):            d[i, j] = np.exp(-gamma * np.linalg.norm(row_x - row_y))    return d

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注