我想使用来自scikit-learn的k-NN算法。对于距离函数,我希望使用我自己的函数。这些应该通过Tanimoto系数来计算。
我编写了tanimo函数,并将其传递给scikit-learn中的metric参数。
我的数据只包含1和0(所以所有特征都是1或0)。
对于tanimo,我计算x和y中所有1的数量,并返回一个标量=系数。KNN函数的调用方式如下:KNeighborsClassifier(metric=tanimoto).fit(X_train,y_train)
def tanimoto(x,y): print x print y a=x.tolist() b=y.tolist() c=np.count_nonzero(x==y) a1=a.count(1.0) b1=b.count(1.0) return float(c)/(a1 + b1 - c)
如果我在tanimoto函数中打印x和y,它们实际上应该只是1和0,对吗?
在tanimoto函数中打印x和y的输出是:
X:[ 0.6371319 0.54557285 0.30214217 0.14690307 0.49778446 0.89183238 0.52445514 0.63379164 0.71873681 0.55008567]Y:[ 0.6371319 0.54557285 0.30214217 0.14690307 0.49778446 0.89183238 0.52445514 0.63379164 0.71873681 0.55008567]X:[ 0. 0. 0. 0.02358491 0.00471698 0. 0. 0. 0. 0.00471698 0.00471698 0.00471698 0.02830189 0.00943396 0. .............................52358491 0.53773585 0.63207547 0.51886792 0.66037736 0.75 0.57075472 0.59433962 0.63679245 0.8490566 0.71698113 0.02358491]Y:[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 1. 1. 0. 1. 1. 1. 1. 0.] # 等等... X始终是一个标量向量,而y是它应该的样子。(仅1和0)
我的X_train向量:
[[ 0. 0. 0. ..., 1. 1. 0.] [ 0. 0. 0. ..., 1. 1. 0.] [ 0. 0. 0. ..., 1. 1. 0.] ..., [ 0. 0. 0. ..., 1. 1. 0.] [ 0. 0. 0. ..., 0. 1. 0.] [ 0. 0. 0. ..., 0. 0. 0.]]
这是代码示例
import numpy as npfrom sklearn.neighbors import NearestNeighbors def tanimoto(x,b): print "X OUTPUT\n ",x,"B OUTPUT\n",b c=np.sum(x==b) a1 = np.sum(x) b1 = np.sum(b) if (a1 + b1 - c)==0: return 0 else: return float(c)/(a1 + b1 - c)tests=[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]classifiers=NearestNeighbors( n_neighbors=4,algorithm='ball_tree',metric=tanimoto).fit(tests)#example
如果我在tanimoto函数中打印出x和b的整个输出
------------ new Fingerprint ------------fingerprint: macc-----------------------------------------X OUTPUT [ 0.86899132 0.85534082 0.21453329 0.24435568 0.32321695 0.6926369 0.5124301 0.98725159 0.01685611 0.58985301] B OUTPUT[ 0.86899132 0.85534082 0.21453329 0.24435568 0.32321695 0.6926369 0.5124301 0.98725159 0.01685611 0.58985301]X OUTPUT [ 0. 0. 0. 0.09090909 0. 0. 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0.09090909 0.09090909 0. 0. 0. 0. 0. 0. 0.09090909 0.09090909 0.09090909 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.09090909 0. 0. 0. 0.09090909 0.09090909 0.18181818 0. 0. 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.09090909 0.18181818 0. 0. 0.27272727 0.09090909 0.09090909 0.27272727 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 0.18181818 0. 0.36363636 0. 0.09090909 0.09090909 0.27272727 0.27272727 0.18181818 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.27272727 0.45454545 0.18181818 0.27272727 0.09090909 0.09090909 0.36363636 0.18181818 0.45454545 0. 0.36363636 0.45454545 0.45454545 0.45454545 0.36363636 0.54545455 0. 0.54545455 0.36363636 0.45454545 0.27272727 0.09090909 0.54545455 0.18181818 0.09090909 0.27272727 0.45454545 0.27272727 0.45454545 0.45454545 0.36363636 0.54545455 0.54545455 0.09090909 0.18181818 0.27272727 0.18181818 0.36363636 0. 0.54545455 0. 0.45454545 0.54545455 0.18181818 0.18181818 0.18181818 0.36363636 0.18181818 0.54545455 0.45454545 0.36363636 0.54545455 0.18181818 0.45454545 0.54545455 0.54545455 0.18181818 0.45454545 0.45454545 0.63636364 0.54545455 0.54545455 0.63636364 0.45454545 0.72727273 0.63636364 0.54545455 0.54545455 0.63636364 0.90909091 0.63636364 0.18181818] B OUTPUT[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]X OUTPUT [ 0. 0. 0. 0.09090909 0. 0. 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0.09090909 0.09090909 0. 0. 0. 0. 0. 0. 0.09090909 0.09090909 0.09090909 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.09090909 0. 0. 0. 0.09090909 0.09090909 0.18181818 0. 0. 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.09090909 0.18181818 0. 0. 0.27272727 0.09090909 0.09090909 0.27272727 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 0.18181818 0. 0.36363636 0. 0.09090909 0.09090909 0.27272727 0.27272727 0.18181818 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.27272727 0.45454545 0.18181818 0.27272727 0.09090909 0.09090909 0.36363636 0.18181818 0.45454545 0. 0.36363636 0.45454545 0.45454545 0.45454545 0.36363636 0.54545455 0. 0.54545455 0.36363636 0.45454545 0.27272727 0.09090909 0.54545455 0.18181818 0.09090909 0.27272727 0.45454545 0.27272727 0.45454545 0.45454545 0.36363636 0.54545455 0.54545455 0.09090909 0.18181818 0.27272727 0.18181818 0.36363636 0. 0.54545455 0. 0.45454545 0.54545455 0.18181818 0.18181818 0.18181818 0.36363636 0.18181818 0.54545455 0.45454545 0.36363636 0.54545455 0.18181818 0.45454545 0.54545455 0.54545455 0.18181818 0.45454545 0.45454545 0.63636364 0.54545455 0.54545455 0.63636364 0.45454545 0.72727273 0.63636364 0.54545455 0.54545455 0.63636364 0.90909091 0.63636364 0.18181818] B OUTPUT[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]X OUTPUT [ 0. 0. 0. 0.09090909 0. 0. 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0.09090909 0.09090909 0. 0. 0. 0. 0. 0. 0.09090909 0.09090909 0.09090909 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.09090909 0. 0. 0. 0.09090909 0.09090909 0.18181818 0. 0. 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.09090909 0.18181818 0. 0. 0.27272727 0.09090909 0.09090909 0.27272727 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 0.18181818 0. 0.36363636 0. 0.09090909 0.09090909 0.27272727 0.27272727 0.18181818 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.27272727 0.45454545 0.18181818 0.27272727 0.09090909 0.09090909 0.36363636 0.18181818 0.45454545 0. 0.36363636 0.45454545 0.45454545 0.45454545 0.36363636 0.54545455 0. 0.54545455 0.36363636 0.45454545 0.27272727 0.09090909 0.54545455 0.18181818 0.09090909 0.27272727 0.45454545 0.27272727 0.45454545 0.45454545 0.36363636 0.54545455 0.54545455 0.09090909 0.18181818 0.27272727 0.18181818 0.36363636 0. 0.54545455 0. 0.45454545 0.54545455 0.18181818 0.18181818 0.18181818 0.36363636 0.18181818 0.54545455 0.45454545 0.36363636 0.54545455 0.18181818 0.45454545 0.54545455 0.54545455 0.18181818 0.45454545 0.45454545 0.63636364 0.54545455 0.54545455 0.63636364 0.45454545 0.72727273 0.63636364 0.54545455 0.54545455 0.63636364 0.90909091 0.63636364 0.18181818] B OUTPUT[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0.]X OUTPUT [ 0. 0. 0. 0.09090909 0. 0. 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0. 0.09090909 0. 0. 0.09090909 0. 0. 0.09090909 0.09090909 0. 0. 0. 0. 0. 0. 0.09090909 0.09090909 0.09090909 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.09090909 0. 0. 0. 0.09090909 0.09090909 0.18181818 0. 0. 0.09090909 0. 0. 0.09090909 0. 0.09090909 0.09090909 0. 0.09090909 0.09090909 0.09090909 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.09090909 0.18181818 0. 0. 0.27272727 0.09090909 0.09090909 0.27272727 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 0.18181818 0. 0.36363636 0. 0.09090909 0.09090909 0.27272727 0.27272727 0.18181818 0. 0.09090909 0.36363636 0.18181818 0.18181818 0.27272727 0.45454545 0.18181818 0.27272727 0.09090909 0.09090909 0.36363636 0.18181818 0.45454545 0. 0.36363636 0.45454545 0.45454545 0.45454545 0.36363636 0.54545455 0. 0.54545455 0.36363636 0.45454545 0.27272727 0.09090909 0.54545455 0.18181818 0.09090909 0.27272727 0.45454545 0.27272727 0.45454545 0.45454545 0.36363636 0.54545455 0.54545455 0.09090909 0.18181818 0.27272727 0.18181818 0.36363636 0. 0.54545455 0. 0.45454545 0.54545455 0.18181818 0.18181818 0.18181818 0.36363636 0.18181818 0.54545455 0.45454545 0.36363636 0.54545455 0.18181818 0.45454545 0.54545455 0.54545455 0.18181818 0.45454545 0.45454545 0.63636364 0.54545455 0.54545455 0.63636364 0.45454545 0.72727273 0.63636364 0.54545455 0.54545455 0.63636364 0.90909091 0.63636364 0.18181818] B OUTPUT[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1.]
在这个数据中,我只使用了11个样本
我只是从我的样本和属性提取中复制粘贴了测试向量,但使用这些数据时,我遇到了相同的问题。问题是X和B不是1和0,X也应该只包含1和0
回答:
您正在使用球树。正如文档中所描述的:
球树递归地将数据划分为由中心点C和半径r定义的节点 […] 通过这种设置,只需计算测试点与中心点之间的距离,就可以确定到节点内所有点的距离的下限和上限。
换句话说,球树不仅计算您的点之间的距离,它经常计算一个点与某一组点的中心点之间的距离。尽管您的所有点坐标都是0或1,但某些点集的中心点往往不是这样。