将OneHotEncoded特征输入分类器时出现错误

我正在尝试为决策树和多项式朴素贝叶斯分类器准备数据。

这是我的数据看起来的样子(pandas数据框)

Label  Feat1  Feat2  Feat3  Feat40        1     3       2      11        0     1       1      22        2     2       1      13        3     3       2      3

我已经将数据分成了dataLabel和dataFeatures。使用dataLabel.ravel()准备了dataLabel

我需要对特征进行离散化处理,以便分类器将它们视为分类而非数值数据。

我尝试使用OneHotEncoder来实现这一点

enc = OneHotEncoder()enc.fit(dataFeatures)chk = enc.transform(dataFeatures)from sklearn.naive_bayes import MultinomialNBmnb = MultinomialNB()from sklearn import metricsfrom sklearn.cross_validation import cross_val_scorescores = cross_val_score(mnb, Y, chk, cv=10, scoring='accuracy')

我得到了这个错误 – bad input shape (64, 16)

这是标签和输入的形状

dataLabel.shape = 72chk.shape = 72,16

为什么分类器不接受onehotencoded特征?

编辑 – 完整的堆栈跟踪代码

/root/anaconda2/lib/python2.7/site-packages/sklearn/utils /validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.  DeprecationWarning)Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/root/anaconda2/lib/python2.7/site-packages/sklearn /cross_validation.py", line 1433, in cross_val_scorefor train, test in cv)  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__while self.dispatch_one_batch(iterator):  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batchself._dispatch(tasks)  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatchjob = ImmediateComputeBatch(batch)  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__self.results = batch()  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__return [func(*args, **kwargs) for func, args, kwargs in self.items]  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_scoreestimator.fit(X_train, y_train, **fit_params)  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 527, in fitX, y = check_X_y(X, y, 'csr') File "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 515, in check_X_yy = column_or_1d(y, warn=True)  File "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 551, in column_or_1draise ValueError("bad input shape {0}".format(shape))

ValueError: bad input shape (64, 16)


回答:

首先,你需要交换chkY,考虑cross_val_score文档。其次,你没有指定Y是什么,所以我希望它是一个一维数组。最后,最好是将所有转换器结合在一个分类器中使用Pipeline,而不是单独使用。像这样:

from sklearn import metricsfrom sklearn.cross_validation import cross_val_scorefrom sklearn.naive_bayes import MultinomialNBfrom sklearn.pipeline import Pipelineclf = Pipeline([    ('transformer', OneHotEncoder()),    ('estimator', MultinomialNB()),])scores = cross_val_score(clf, dataFeatures.values, Y, cv=10, scoring='accuracy')

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注