我正在尝试对多标签分类进行特征选择。我已经提取了模型将要训练的特征到X中。模型测试也是在相同的X上进行的。我使用了Pipeline并选择了最佳的100个特征-
#arrFinal包含所有特征和标签。最后16列是标签,特征从第1列到第521列。倒数第17列未被使用X=np.array(arrFinal[:,1:-17])Xtest=np.array(X)Y=np.array(arrFinal[:,522:]).astype(int)clf = Pipeline([('chi2', SelectKBest(chi2, k=100)),('rbf',SVC())])clf = OneVsRestClassifier(clf)clf.fit(X, Y)ans=clf.predict(X_test)
但我遇到了以下错误-
Traceback (most recent call last): File "C:\Users\50004182\Documents\\callee.py", line 10, in <module> combine.combine_main(dict_ids,inv_dict_ids,noOfIDs) File "C:\Users\50004182\Documents\combine.py", line 201, in combine_main clf.fit(X, Y) File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 287, in fit for i, column in enumerate(columns)) File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 804, in __call__ while self.dispatch_one_batch(iterator): File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 662, in dispatch_one_batch self._dispatch(tasks) File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 570, in _dispatch job = ImmediateComputeBatch(batch) File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 183, in __init__ self.results = batch() File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__ return [func(*args, **kwargs) for func, args, kwargs in self.items] File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in <listcomp> return [func(*args, **kwargs) for func, args, kwargs in self.items] File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 74, in _fit_binary estimator.fit(X, y) File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 164, in fit Xt, fit_params = self._pre_transform(X, y, **fit_params) File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 145, in _pre_transform Xt = transform.fit_transform(Xt, y, **fit_params_steps[name]) File "C:\Python34\lib\site-packages\sklearn\base.py", line 458, in fit_transform return self.fit(X, y, **fit_params).transform(X) File "C:\Python34\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 331, in fit self.scores_, self.pvalues_ = self.score_func(X, y) File "C:\Python34\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 213, in chi2 if np.any((X.data if issparse(X) else X) < 0):TypeError: unorderable types: numpy.ndarray() < int()
回答:
因此,在与@JamieBull和@Joker在上面的评论中进行调试后,我们得出的解决方案是:
确保类型正确(最初是字符串)
X=np.array(arrFinal[:,1:-17]).astype(np.float64)Xtest=np.array(X)Y=np.array(arrFinal[:,522:]).astype(int)
在使用chi2
之前,先使用VarianceThreshold
来移除常数(0)列。
clf = Pipeline([ ('vt', VarianceThreshold()), ('chi2', SelectKBest(chi2, k=100)), ('rbf',SVC())])clf = OneVsRestClassifier(clf)clf.fit(X, Y)ans=clf.predict(X_test)