NLTK: 语料库级别BLEU分数与句子级别BLEU分数

我在Ubuntu上使用Python导入了nltk来计算BLEU分数。我理解句子级别BLEU分数的工作原理，但我不理解语料库级别BLEU分数的工作原理。

以下是我计算语料库级别BLEU分数的代码：

出于某种原因，上述代码的BLEU分数为0。我原本期望语料库级别的BLEU分数至少为0.5。

这是我计算句子级别BLEU分数的代码

这里的句子级别BLEU分数是0.71，这是我预期的，考虑到简短惩罚和缺少的单词”a”。然而，我不理解语料库级别BLEU分数的工作原理。

任何帮助都将不胜感激。

回答：

TL;DR:

>>> import nltk
>>> hypothesis = ['This', 'is', 'cat']
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # 一个句子的参考列表。
>>> list_of_references = [references] # 语料库中所有句子的参考列表。
>>> list_of_hypotheses = [hypothesis] # 对应于参考列表的假设列表。
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453

（注意：您需要在develop分支上拉取最新的NLTK版本，以获得BLEU分数实现的稳定版本）

详细说明:

实际上，如果您的整个语料库中只有一个参考和一个假设，corpus_bleu()和sentence_bleu()应该返回相同的值，如上例所示。

在代码中，我们看到sentence_bleu实际上是corpus_bleu的鸭子类型：

def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                  smoothing_function=None):
    return corpus_bleu([references], [hypothesis], weights, smoothing_function)

如果我们查看sentence_bleu的参数：

 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                      smoothing_function=None):
    """"    
    :param references: 参考句子
    :type references: list(list(str))
    :param hypothesis: 假设句子
    :type hypothesis: list(str)
    :param weights: 一元、二元、三元等的权重
    :type weights: list(float)
    :return: 句子级别的BLEU分数。
    :rtype: float
    """

sentence_bleu的参考输入是一个list(list(str))。

因此，如果您有一个句子字符串，例如"This is a cat"，您需要将其标记化为字符串列表，["This", "is", "a", "cat"]，并且由于它允许多个参考，它必须是一个字符串列表的列表，例如，如果您有第二个参考，”This is a feline”，您输入到sentence_bleu()的将是：

references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)

当涉及到corpus_bleu()的list_of_references参数时，它基本上是sentence_bleu()作为参考所接受的任何东西的列表：

def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
                smoothing_function=None):
    """
    :param references: 参考句子的语料库列表，相对于假设
    :type references: list(list(list(str)))
    :param hypotheses: 假设句子的列表
    :type hypotheses: list(list(str))
    :param weights: 一元、二元、三元等的权重
    :type weights: list(float)
    :return: 语料库级别的BLEU分数。
    :rtype: float
    """

除了查看nltk/translate/bleu_score.py中的doctest，您还可以查看nltk/test/unit/translate/test_bleu_score.py中的单元测试，以了解如何使用bleu_score.py中的每个组件。

顺便说一下，由于sentence_bleu在(nltk.translate.__init__.py中被导入为bleu）(https://github.com/nltk/nltk/blob/develop/nltk/translate/__init__.py#L21)，使用

from nltk.translate import bleu

将与以下相同：

from nltk.translate.bleu_score import sentence_bleu

在代码中：

>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False

学技术

NLTK: 语料库级别BLEU分数与句子级别BLEU分数

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复