Home IT技术如何根据单词获取词袋模型词汇表中的单词ID？

如何根据单词获取词袋模型词汇表中的单词ID？

IT技术 xiaolong · 2025年4月10日 · 0 Comment

我已经对一组消息应用了词袋模型，如下所示：

    bow_transformer = CountVectorizer(analyzer=split_into_lemmas).fit(messages['message'])    B4 = bow_transformer.transform([msg4])    print B4    print bow_transformer.get_feature_names()[6736]    print bow_transformer.get_feature_names()[8013]

(0, 1158) 1
(0, 1899) 1
(0, 2897) 1
(0, 2927) 1
(0, 4021) 1
(0, 6736) 2
(0, 7111) 1
(0, 7698) 1
(0, 8013) 2

say

u

我需要的是给定像“say”这样的单词，提取其ID“6736”（这与bow_transformer.get_feature_names()[6736]所做的正好相反）？！

回答：

你应该使用vocabulary_属性：

>>> bow_transformer.vocabulary_.get('say')6736

machine-learning python-2.7 scikit-learn

发表回复取消回复