Gensim FastText 获取词汇表或词索引

尝试使用 gensim's fasttext，测试来自 gensim 的示例代码，只做了小的更改，将参数替换为 corpus_iterable

https://radimrehurek.com/gensim/models/fasttext.html

gensim_version == 4.0.1

from gensim.models import FastTextfrom gensim.test.utils import common_texts  # 一些示例句子print(common_texts[0])['human', 'interface', 'computer']print(len(common_texts))9model = FastText(vector_size=4, window=3, min_count=1)  # 实例化model.build_vocab(corpus_iterable=common_texts)model.train(corpus_iterable=common_texts, total_examples=len(common_texts), epochs=10)

它可以工作，但有没有办法获取模型的词汇表？例如，在Tensorflow Tokenizer中有一个word_index，它会返回所有单词。这里有类似的东西吗？

回答：

模型将词向量存储在.wv对象中。我不知道您使用的是哪个gensim版本，但对于Gensim 4，您可以通过调用model.wv.key_to_index来获取键控向量。您将得到一个包含单词及其索引的字典

from gensim.models import FastTextfrom gensim.test.utils import common_texts  # 一些示例句子print(common_texts[0])# ['human', 'interface', 'computer']print(len(common_texts))# 9model = FastText(vector_size=4, window=3, min_count=1)  # 实例化model.build_vocab(corpus_iterable=common_texts)model.train(corpus_iterable=common_texts, total_examples=len(common_texts), epochs=10)# 获取词汇表键及其索引vocab = model.wv.key_to_indexprint(vocab)# 输出# {'system': 0, 'graph': 1, 'trees': 2, 'user': 3, 'minors': 4, 'eps': 5, 'time': 6, # 'response': 7, 'survey': 8, 'computer': 9, 'interface': 10, 'human': 11}

学技术

Gensim FastText 获取词汇表或词索引

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复