我有两个变量X和Y。
X的结构(即一个np.array):
[[26777 24918 26821 ... -1 -1 -1][26777 26831 26832 ... -1 -1 -1][26777 24918 26821 ... -1 -1 -1]...[26811 26832 26813 ... -1 -1 -1][26830 26831 26832 ... -1 -1 -1][26830 26831 26832 ... -1 -1 -1]]
Y的结构:
[[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [26764, 25803, 26781], [26764, 25803, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [1252, 26777, 16172], [1252, 26777, 16172]]
Y中的数组,例如[1252, 26777, 26831],是三个独立的特征。
我正在使用scikit learn模块中的Knn分类器
classifier = KNeighborsClassifier(n_neighbors=3)classifier.fit(X,Y)predictions = classifier.predict(X)print(accuracy_score(Y,predictions))
但是我得到了一个错误提示:
ValueError: multiclass-multioutput is not supported
我猜测Y的结构不被支持,为了让程序能够执行,我应该做哪些更改?
输入:
Deluxe Single room with sea view
期望输出:
c_class = Deluxec_occ = singlec_view = sea
回答:
正如错误中提到的,KNN
不支持多输出回归/分类。
对于您的问题,您需要MultiOutputClassifier()
。
from sklearn.multioutput import MultiOutputClassifierknn = KNeighborsClassifier(n_neighbors=3)classifier = MultiOutputClassifier(knn, n_jobs=-1)classifier.fit(X,Y)
工作示例:
>>> from sklearn.feature_extraction.text import TfidfVectorizer>>> corpus = [... 'This is the first document.',... 'This document is the second document.',... 'And this is the third one.',... 'Is this the first document?',... ]>>> vectorizer = TfidfVectorizer()>>> X = vectorizer.fit_transform(corpus)>>> Y = [[124323,1234132,1234],[124323,4132,14],[1,4132,1234],[1,4132,14]]>>> from sklearn.multioutput import MultiOutputClassifier>>> from sklearn.neighbors import KNeighborsClassifier>>> knn = KNeighborsClassifier(n_neighbors=3)>>> classifier = MultiOutputClassifier(knn, n_jobs=-1)>>> classifier.fit(X,Y)>>> predictions = classifier.predict(X)array([[124323, 4132, 14], [124323, 4132, 14], [ 1, 4132, 1234], [124323, 4132, 14]])>>> classifier.score(X,np.array(Y))0.5>>> test_data = ['I want to test this']>>> classifier.predict(vectorizer.transform(test_data))array([[124323, 4132, 14]])