我有一个用于对语料库进行机器学习的管道。首先它提取文本,使用TfidfVectorizer
提取n-gram,然后选择最佳特征。在没有特征选择步骤的情况下,管道运行正常。然而,加入特征选择后,我得到了以下错误:
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.py", line 90, in __init__ names, estimators = zip(*steps)TypeError: zip argument #1 must support iteration
在SGDClassifier()
处。
pipeline = Pipeline([ # 使用FeatureUnion组合特征 ('features', FeatureUnion( transformer_list=[ # N-GRAMS ('ngrams', Pipeline([ ('extractor', TextExtractor(normalized=True)), # 返回一个字符串列表 ('vectorizer', TfidfVectorizer(analyzer='word', strip_accents='ascii', use_idf=True, norm="l2", min_df=3, max_df=0.90)), ('feature_selection', SelectPercentile(score_func=chi2, percentile=70)), ])), ],, )), ('clf', Pipeline([ SGDClassifier(n_jobs=-1, verbose=0) ])),])
回答:
看起来你在Pipeline中漏掉了一个标签
('clf', Pipeline([ SGDClassifier(n_jobs=-1, verbose=0)])),
应该改为
('clf', Pipeline([ ('sgd', SGDClassifier(n_jobs=-1, verbose=0))])),