特征选择过程中出现错误

我正在尝试对多标签分类进行特征选择。我已经提取了模型将要训练的特征到X中。模型测试也是在相同的X上进行的。我使用了Pipeline并选择了最佳的100个特征-

#arrFinal包含所有特征和标签。最后16列是标签,特征从第1列到第521列。倒数第17列未被使用X=np.array(arrFinal[:,1:-17])Xtest=np.array(X)Y=np.array(arrFinal[:,522:]).astype(int)clf = Pipeline([('chi2', SelectKBest(chi2, k=100)),('rbf',SVC())])clf = OneVsRestClassifier(clf)clf.fit(X, Y)ans=clf.predict(X_test)

但我遇到了以下错误-

Traceback (most recent call last):  File "C:\Users\50004182\Documents\\callee.py", line 10, in <module>    combine.combine_main(dict_ids,inv_dict_ids,noOfIDs)  File "C:\Users\50004182\Documents\combine.py", line 201, in combine_main    clf.fit(X, Y)  File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 287, in fit    for i, column in enumerate(columns))  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 804, in __call__    while self.dispatch_one_batch(iterator):  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 662, in dispatch_one_batch    self._dispatch(tasks)  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 570, in _dispatch    job = ImmediateComputeBatch(batch)  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 183, in __init__    self.results = batch()  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__    return [func(*args, **kwargs) for func, args, kwargs in self.items]  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in <listcomp>    return [func(*args, **kwargs) for func, args, kwargs in self.items]  File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 74, in _fit_binary    estimator.fit(X, y)  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 164, in fit    Xt, fit_params = self._pre_transform(X, y, **fit_params)  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 145, in _pre_transform    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])  File "C:\Python34\lib\site-packages\sklearn\base.py", line 458, in fit_transform    return self.fit(X, y, **fit_params).transform(X)  File "C:\Python34\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 331, in fit    self.scores_, self.pvalues_ = self.score_func(X, y)  File "C:\Python34\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 213, in chi2    if np.any((X.data if issparse(X) else X) < 0):TypeError: unorderable types: numpy.ndarray() < int()

回答:

因此,在与@JamieBull和@Joker在上面的评论中进行调试后,我们得出的解决方案是:

确保类型正确(最初是字符串)

X=np.array(arrFinal[:,1:-17]).astype(np.float64)Xtest=np.array(X)Y=np.array(arrFinal[:,522:]).astype(int)

在使用chi2之前,先使用VarianceThreshold来移除常数(0)列。

clf = Pipeline([      ('vt', VarianceThreshold()),      ('chi2', SelectKBest(chi2, k=100)),      ('rbf',SVC())])clf = OneVsRestClassifier(clf)clf.fit(X, Y)ans=clf.predict(X_test)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注