使用自定义距离函数的k-NN算法来自scikit-learn

我想使用来自scikit-learn的k-NN算法。对于距离函数,我希望使用我自己的函数。这些应该通过Tanimoto系数来计算。

我编写了tanimo函数,并将其传递给scikit-learn中的metric参数。

我的数据只包含1和0(所以所有特征都是1或0)。

对于tanimo,我计算x和y中所有1的数量,并返回一个标量=系数。KNN函数的调用方式如下:KNeighborsClassifier(metric=tanimoto).fit(X_train,y_train)

def tanimoto(x,y):    print x    print y    a=x.tolist()    b=y.tolist()    c=np.count_nonzero(x==y)    a1=a.count(1.0)    b1=b.count(1.0)    return float(c)/(a1 + b1 - c)

如果我在tanimoto函数中打印x和y,它们实际上应该只是1和0,对吗?

在tanimoto函数中打印x和y的输出是:

X:[ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238  0.52445514  0.63379164  0.71873681  0.55008567]Y:[ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238  0.52445514  0.63379164  0.71873681  0.55008567]X:[ 0.          0.          0.          0.02358491  0.00471698  0.          0.  0.          0.          0.00471698  0.00471698  0.00471698  0.02830189  0.00943396  0.     .............................52358491  0.53773585  0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962  0.63679245  0.8490566   0.71698113  0.02358491]Y:[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  1.  0.  1.  1.  1.  1.  0.]   # 等等... X始终是一个标量向量,而y是它应该的样子。(仅1和0)

我的X_train向量:

[[ 0.  0.  0. ...,  1.  1.  0.] [ 0.  0.  0. ...,  1.  1.  0.] [ 0.  0.  0. ...,  1.  1.  0.] ..., [ 0.  0.  0. ...,  1.  1.  0.] [ 0.  0.  0. ...,  0.  1.  0.] [ 0.  0.  0. ...,  0.  0.  0.]]

这是代码示例

import numpy as npfrom sklearn.neighbors import NearestNeighbors  def tanimoto(x,b):    print "X  OUTPUT\n  ",x,"B OUTPUT\n",b    c=np.sum(x==b)    a1 = np.sum(x)    b1 = np.sum(b)    if (a1 + b1 - c)==0:        return 0    else:        return float(c)/(a1 + b1 - c)tests=[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]classifiers=NearestNeighbors( n_neighbors=4,algorithm='ball_tree',metric=tanimoto).fit(tests)#example

如果我在tanimoto函数中打印出x和b的整个输出

------------ new Fingerprint ------------fingerprint:  macc-----------------------------------------X  OUTPUT   [ 0.86899132  0.85534082  0.21453329  0.24435568  0.32321695  0.6926369  0.5124301   0.98725159  0.01685611  0.58985301] B OUTPUT[ 0.86899132  0.85534082  0.21453329  0.24435568  0.32321695  0.6926369  0.5124301   0.98725159  0.01685611  0.58985301]X  OUTPUT   [ 0.          0.          0.          0.09090909  0.          0.          0.  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.          0.          0.09090909  0.          0.  0.09090909  0.          0.          0.09090909  0.09090909  0.          0.  0.          0.          0.          0.          0.09090909  0.09090909  0.09090909  0.09090909  0.          0.          0.09090909  0.  0.09090909  0.09090909  0.09090909  0.          0.09090909  0.09090909  0.          0.          0.          0.09090909  0.09090909  0.18181818  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.09090909  0.09090909  0.          0.09090909  0.09090909  0.09090909  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.09090909  0.18181818  0.          0.          0.27272727  0.09090909  0.09090909  0.27272727  0.09090909  0.09090909  0.09090909  0.09090909  0.09090909  0.18181818  0.          0.36363636  0.          0.09090909  0.09090909  0.27272727  0.27272727  0.18181818  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.27272727  0.45454545  0.18181818  0.27272727  0.09090909  0.09090909  0.36363636  0.18181818  0.45454545  0.          0.36363636  0.45454545  0.45454545  0.45454545  0.36363636  0.54545455  0.          0.54545455  0.36363636  0.45454545  0.27272727  0.09090909  0.54545455  0.18181818  0.09090909  0.27272727  0.45454545  0.27272727  0.45454545  0.45454545  0.36363636  0.54545455  0.54545455  0.09090909  0.18181818  0.27272727  0.18181818  0.36363636  0.  0.54545455  0.          0.45454545  0.54545455  0.18181818  0.18181818  0.18181818  0.36363636  0.18181818  0.54545455  0.45454545  0.36363636  0.54545455  0.18181818  0.45454545  0.54545455  0.54545455  0.18181818  0.45454545  0.45454545  0.63636364  0.54545455  0.54545455  0.63636364  0.45454545  0.72727273  0.63636364  0.54545455  0.54545455  0.63636364  0.90909091  0.63636364  0.18181818] B OUTPUT[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  0.  1.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  0.  0.  1.  0.  0.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  0.  0.  1.  0.  1.  0.  1.  1.  1.  0.  0.  0.  0.  1.  1.  0.  1.  0.  1.  0.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.]X  OUTPUT   [ 0.          0.          0.          0.09090909  0.          0.          0.  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.          0.          0.09090909  0.          0.  0.09090909  0.          0.          0.09090909  0.09090909  0.          0.  0.          0.          0.          0.          0.09090909  0.09090909  0.09090909  0.09090909  0.          0.          0.09090909  0.  0.09090909  0.09090909  0.09090909  0.          0.09090909  0.09090909  0.          0.          0.          0.09090909  0.09090909  0.18181818  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.09090909  0.09090909  0.          0.09090909  0.09090909  0.09090909  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.09090909  0.18181818  0.          0.          0.27272727  0.09090909  0.09090909  0.27272727  0.09090909  0.09090909  0.09090909  0.09090909  0.09090909  0.18181818  0.          0.36363636  0.          0.09090909  0.09090909  0.27272727  0.27272727  0.18181818  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.27272727  0.45454545  0.18181818  0.27272727  0.09090909  0.09090909  0.36363636  0.18181818  0.45454545  0.          0.36363636  0.45454545  0.45454545  0.45454545  0.36363636  0.54545455  0.          0.54545455  0.36363636  0.45454545  0.27272727  0.09090909  0.54545455  0.18181818  0.09090909  0.27272727  0.45454545  0.27272727  0.45454545  0.45454545  0.36363636  0.54545455  0.54545455  0.09090909  0.18181818  0.27272727  0.18181818  0.36363636  0.  0.54545455  0.          0.45454545  0.54545455  0.18181818  0.18181818  0.18181818  0.36363636  0.18181818  0.54545455  0.45454545  0.36363636  0.54545455  0.18181818  0.45454545  0.54545455  0.54545455  0.18181818  0.45454545  0.45454545  0.63636364  0.54545455  0.54545455  0.63636364  0.45454545  0.72727273  0.63636364  0.54545455  0.54545455  0.63636364  0.90909091  0.63636364  0.18181818] B OUTPUT[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]X  OUTPUT   [ 0.          0.          0.          0.09090909  0.          0.          0.  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.          0.          0.09090909  0.          0.  0.09090909  0.          0.          0.09090909  0.09090909  0.          0.  0.          0.          0.          0.          0.09090909  0.09090909  0.09090909  0.09090909  0.          0.          0.09090909  0.  0.09090909  0.09090909  0.09090909  0.          0.09090909  0.09090909  0.          0.          0.          0.09090909  0.09090909  0.18181818  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.09090909  0.09090909  0.          0.09090909  0.09090909  0.09090909  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.09090909  0.18181818  0.          0.          0.27272727  0.09090909  0.09090909  0.27272727  0.09090909  0.09090909  0.09090909  0.09090909  0.09090909  0.18181818  0.          0.36363636  0.          0.09090909  0.09090909  0.27272727  0.27272727  0.18181818  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.27272727  0.45454545  0.18181818  0.27272727  0.09090909  0.09090909  0.36363636  0.18181818  0.45454545  0.          0.36363636  0.45454545  0.45454545  0.45454545  0.36363636  0.54545455  0.          0.54545455  0.36363636  0.45454545  0.27272727  0.09090909  0.54545455  0.18181818  0.09090909  0.27272727  0.45454545  0.27272727  0.45454545  0.45454545  0.36363636  0.54545455  0.54545455  0.09090909  0.18181818  0.27272727  0.18181818  0.36363636  0.  0.54545455  0.          0.45454545  0.54545455  0.18181818  0.18181818  0.18181818  0.36363636  0.18181818  0.54545455  0.45454545  0.36363636  0.54545455  0.18181818  0.45454545  0.54545455  0.54545455  0.18181818  0.45454545  0.45454545  0.63636364  0.54545455  0.54545455  0.63636364  0.45454545  0.72727273  0.63636364  0.54545455  0.54545455  0.63636364  0.90909091  0.63636364  0.18181818] B OUTPUT[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  0.  1.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  0.  0.  1.  0.  0.  1.  0.  0.  1.  0.  1.  1.  1.  0.  1.  0.  0.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.]X  OUTPUT   [ 0.          0.          0.          0.09090909  0.          0.          0.  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.          0.          0.09090909  0.          0.  0.09090909  0.          0.          0.09090909  0.09090909  0.          0.  0.          0.          0.          0.          0.09090909  0.09090909  0.09090909  0.09090909  0.          0.          0.09090909  0.  0.09090909  0.09090909  0.09090909  0.          0.09090909  0.09090909  0.          0.          0.          0.09090909  0.09090909  0.18181818  0.          0.          0.09090909  0.          0.          0.09090909  0.          0.09090909  0.09090909  0.          0.09090909  0.09090909  0.09090909  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.09090909  0.18181818  0.          0.          0.27272727  0.09090909  0.09090909  0.27272727  0.09090909  0.09090909  0.09090909  0.09090909  0.09090909  0.18181818  0.          0.36363636  0.          0.09090909  0.09090909  0.27272727  0.27272727  0.18181818  0.          0.09090909  0.36363636  0.18181818  0.18181818  0.27272727  0.45454545  0.18181818  0.27272727  0.09090909  0.09090909  0.36363636  0.18181818  0.45454545  0.          0.36363636  0.45454545  0.45454545  0.45454545  0.36363636  0.54545455  0.          0.54545455  0.36363636  0.45454545  0.27272727  0.09090909  0.54545455  0.18181818  0.09090909  0.27272727  0.45454545  0.27272727  0.45454545  0.45454545  0.36363636  0.54545455  0.54545455  0.09090909  0.18181818  0.27272727  0.18181818  0.36363636  0.  0.54545455  0.          0.45454545  0.54545455  0.18181818  0.18181818  0.18181818  0.36363636  0.18181818  0.54545455  0.45454545  0.36363636  0.54545455  0.18181818  0.45454545  0.54545455  0.54545455  0.18181818  0.45454545  0.45454545  0.63636364  0.54545455  0.54545455  0.63636364  0.45454545  0.72727273  0.63636364  0.54545455  0.54545455  0.63636364  0.90909091  0.63636364  0.18181818] B OUTPUT[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  0.  1.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  1.  0.  0.  1.  1.  0.  0.  0.  0.  0.  1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  1.]

在这个数据中,我只使用了11个样本

我只是从我的样本和属性提取中复制粘贴了测试向量,但使用这些数据时,我遇到了相同的问题。问题是X和B不是1和0,X也应该只包含1和0


回答:

您正在使用球树。正如文档中所描述的:

球树递归地将数据划分为由中心点C和半径r定义的节点 […] 通过这种设置,只需计算测试点与中心点之间的距离,就可以确定到节点内所有点的距离的下限和上限。

换句话说,球树不仅计算您的点之间的距离,它经常计算一个点与某一组点的中心点之间的距离。尽管您的所有点坐标都是0或1,但某些点集的中心点往往不是这样。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注