Home IT技术如何获取scikit-learn分类器中最具信息量的特征？

如何获取scikit-learn分类器中最具信息量的特征？

IT技术 xiaolong · 2025年4月7日 · 0 Comment

像liblinear和nltk这样的机器学习包中的分类器提供了一个方法show_most_informative_features()，这对于调试特征非常有帮助：

viagra = None          ok : spam     =      4.5 : 1.0hello = True           ok : spam     =      4.5 : 1.0hello = None           spam : ok     =      3.3 : 1.0viagra = True          spam : ok     =      3.3 : 1.0casino = True          spam : ok     =      2.0 : 1.0casino = None          ok : spam     =      1.5 : 1.0

我的问题是，scikit-learn中的分类器是否实现了类似的功能。我查看了文档，但没有找到类似的内容。

如果还没有这样的功能，有人知道如何通过变通方法获取这些值吗？

回答：

在larsmans的代码帮助下，我为二元情况编写了以下代码：

def show_most_informative_features(vectorizer, clf, n=20):    feature_names = vectorizer.get_feature_names()    coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))    top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])    for (coef_1, fn_1), (coef_2, fn_2) in top:        print "\t%.4f\t%-15s\t\t%.4f\t%-15s" % (coef_1, fn_1, coef_2, fn_2)

classification machine-learning python scikit-learn

发表回复取消回复