特征选择过程中出现错误

我正在尝试对多标签分类进行特征选择。我已经提取了模型将要训练的特征到X中。模型测试也是在相同的X上进行的。我使用了Pipeline并选择了最佳的100个特征-

#arrFinal包含所有特征和标签。最后16列是标签,特征从第1列到第521列。倒数第17列未被使用X=np.array(arrFinal[:,1:-17])Xtest=np.array(X)Y=np.array(arrFinal[:,522:]).astype(int)clf = Pipeline([('chi2', SelectKBest(chi2, k=100)),('rbf',SVC())])clf = OneVsRestClassifier(clf)clf.fit(X, Y)ans=clf.predict(X_test)

但我遇到了以下错误-

Traceback (most recent call last):  File "C:\Users\50004182\Documents\\callee.py", line 10, in <module>    combine.combine_main(dict_ids,inv_dict_ids,noOfIDs)  File "C:\Users\50004182\Documents\combine.py", line 201, in combine_main    clf.fit(X, Y)  File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 287, in fit    for i, column in enumerate(columns))  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 804, in __call__    while self.dispatch_one_batch(iterator):  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 662, in dispatch_one_batch    self._dispatch(tasks)  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 570, in _dispatch    job = ImmediateComputeBatch(batch)  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 183, in __init__    self.results = batch()  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__    return [func(*args, **kwargs) for func, args, kwargs in self.items]  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in <listcomp>    return [func(*args, **kwargs) for func, args, kwargs in self.items]  File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 74, in _fit_binary    estimator.fit(X, y)  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 164, in fit    Xt, fit_params = self._pre_transform(X, y, **fit_params)  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 145, in _pre_transform    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])  File "C:\Python34\lib\site-packages\sklearn\base.py", line 458, in fit_transform    return self.fit(X, y, **fit_params).transform(X)  File "C:\Python34\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 331, in fit    self.scores_, self.pvalues_ = self.score_func(X, y)  File "C:\Python34\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 213, in chi2    if np.any((X.data if issparse(X) else X) < 0):TypeError: unorderable types: numpy.ndarray() < int()

回答:

因此,在与@JamieBull和@Joker在上面的评论中进行调试后,我们得出的解决方案是:

确保类型正确(最初是字符串)

X=np.array(arrFinal[:,1:-17]).astype(np.float64)Xtest=np.array(X)Y=np.array(arrFinal[:,522:]).astype(int)

在使用chi2之前,先使用VarianceThreshold来移除常数(0)列。

clf = Pipeline([      ('vt', VarianceThreshold()),      ('chi2', SelectKBest(chi2, k=100)),      ('rbf',SVC())])clf = OneVsRestClassifier(clf)clf.fit(X, Y)ans=clf.predict(X_test)

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注