我正在尝试运行一个结合了文本和数值特征的模型,但遇到了错误 ValueError: Invalid parameter tfidf for estimator
。问题出在parameters
的语法上吗?可能有帮助的链接:FeatureUnion的使用FeatureUnion文档
tknzr = tokenize.word_tokenizevect = CountVectorizer(tokenizer=tknzr, stop_words={'english'}, max_df=0.9, min_df=2)scl = StandardScaler(with_mean=False)tfidf = TfidfTransformer(norm=None)parameters = { 'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)], 'tfidf__use_idf': (True, False), 'clf__alpha': tuple(10 ** (np.arange(-4, 4, dtype='float'))), 'clf__loss': ('hinge', 'squared_hinge', 'log', 'modified_huber', 'perceptron'), 'clf__penalty': ('l1', 'l2'), 'clf__tol': (1e07, 1e-6, 1e-5, 1e-4, 1e-3)}combined_clf = Pipeline([ ('features', FeatureUnion([ ('numeric_features', Pipeline([ ('selector', transfomer_numeric) ])), ('text_features', Pipeline([ ('selector', transformer_text), ('vect', vect), ('tfidf', tfidf), ('scaler', scl), ])) ])), ('clf', SGDClassifier(random_state=42, max_iter=int(10 ** 6 / len(X_train)), shuffle=True))])
回答:
如这里所述,嵌套参数必须通过__
(双下划线)语法访问。根据您要访问的参数的深度,这适用于递归情况。参数use_idf
位于以下路径:
features
> text_features
> tfidf
> use_idf
因此,您的网格中的参数应为:
'features__text_features__tfidf__use_idf': [True, False]
同样,ngram_range
的语法应为:
'features__text_features__vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]