如果我有一组单词,想在它们之间找出模式,然后在长文本中寻找这个模式,我应该使用机器学习、文本分析还是模式识别?
回答:
我会构建所有单词的n-gram。
from nltk import ngramsfrom collections import Counterwords = ["aim", "aid", "bail", "bait"]def build_ngrams(words, from_size, to_size): word_ngrams = [] for word in words: for ngram_size in range(from_size, to_size + 1): ng = ngrams(word, ngram_size) word_ngrams.extend(ng) return word_ngrams# 构建所有二元和三元词组word_ngrams = build_ngrams(words, 2, 3)# 找出最常见的n-gramscounter = Counter(word_ngrams)print(counter.most_common(3))
这将为你提供最常见的模式,你可以稍后用它来进行搜索。