Porter Stemmer Algorithm 不返回预期输出？当修改为 def

我正在使用 PorterStemmer 的 Python 版本

Porter 词干提取算法（或称‘Porter 词干器’）是一种用于从英语单词中移除常见的形态和屈折词尾的过程。其主要用途是在设置信息检索系统时，作为术语标准化过程的一部分。

对于以下内容…

你需要做的另一件事是将每个单词简化为其词干。例如，单词 sing、sings、singing 都有相同的词干，即 sing。有一种被广泛接受的方法来做到这一点，称为 Porter 算法。你可以从 http://tartarus.org/martin/PorterStemmer/ 下载实现它的工具。

我已经修改了代码…

if __name__ == '__main__':    p = PorterStemmer()    if len(sys.argv) > 1:        for f in sys.argv[1:]:            infile = open(f, 'r')            while 1:                output = ''                word = ''                line = infile.readline()                if line == '':                    break                for c in line:                    if c.isalpha():                        word += c.lower()                    else:                        if word:                            output += p.stem(word, 0,len(word)-1)                            word = ''                        output += c.lower()                print output,            infile.close()

以便从 input 读取而不是从预处理的字符串文件中读取，并返回输出。

def algorithm(input):    p = PorterStemmer()    while 1:        output = ''        word = ''        if input == '':            break        for c in input:            if c.isalpha():                word += c.lower()            else:                if word:                    output += p.stem(word, 0,len(word)-1)                    word = ''                output += c.lower()        return output

请注意，如果我将 return output 缩进到与 while 1: 相同的级别，它会变成一个 无限循环。

使用示例

import PorterStemmer as psps.algorithm("Michael is Singing");

输出

Michael is

预期输出

Michael is Sing

我做错了什么？

回答：

看起来问题出在它目前没有将输入的最后部分写入 output（例如，尝试 “Michael is Singing stuff”，它应该正确写入所有内容并省略 ‘stuff’）。可能有更优雅的方法来处理这个问题，但你可以尝试在 for 循环中添加一个 else 子句。由于问题在于最后一个单词没有被包含在 output 中，我们可以使用 else 来确保在 for 循环完成时添加最后一个单词：

def algorithm(input):    print input    p = PorterStemmer()    while 1:        output = ''        word = ''        if input == '':            break        for c in input:            if c.isalpha():                word += c.lower()            elif word:                output += p.stem(word, 0,len(word)-1)                word = ''                output += c.lower()        else:            output += p.stem(word, 0, len(word)-1)          print output        return output

这个版本已经通过两个测试用例进行了广泛测试，所以显然它是无懈可击的 🙂 可能还有一些边缘情况存在，但希望这能帮你起步。

学技术

Porter Stemmer Algorithm 不返回预期输出？当修改为 def

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复