Python NLTK 朴素贝叶斯分类器：该分类器用于分类输入的底层计算是什么？

我使用Python NLTK中的朴素贝叶斯分类器来计算以下示例的概率分布：

import nltkdef main():    train = [(dict(feature=1), 'class_x'), (dict(feature=0), 'class_x'),   (dict(feature=0), 'class_y'), (dict(feature=0), 'class_y')]    test = [dict(feature=1)]    classifier = nltk.classify.NaiveBayesClassifier.train(train)    print("classes available: ", sorted(classifier.labels()))    print ("input assigned to: ", classifier.classify_many(test))    for pdist in classifier.prob_classify_many(test):        print ("probability distribution: ")        print ('%.4f %.4f' % (pdist.prob('class_x'), pdist.prob('class_y')))if __name__ == '__main__':    main()

训练数据集中有两个类别（class_x和class_y）。每个类别有两个输入。对于class_x，第一个输入特征的值为1，第二个为0。对于class_y，两个输入特征的值均为0。测试数据集由一个输入组成，值为1。

运行代码后，输出结果为：

classes available:  ['class_x', 'class_y']input assigned to:  ['class_x']0.7500 0.2500

为了获得每个类别的概率或可能性，分类器应将类的先验概率（在本例中为0.5）乘以类中每个特征的概率。需要考虑平滑处理。

我通常使用类似于以下（或类似变体）的公式：

P(feature|class) = 类的先验概率 * 类中特征的频率+1 / 类中总特征数 + 词汇量。平滑处理可能会有所不同，并略微改变结果。

在上面的示例代码中，分类器是如何精确计算概率分布的？使用的公式是什么？

我查看了这里和这里，但没有找到关于计算方式的具体信息。

提前感谢。

回答：

从源代码中

https://github.com/nltk/nltk/blob/develop/nltk/classify/naivebayes.py#L9yo

|                       P(label) * P(features|label)|  P(label|features) = ------------------------------|                              P(features)

学技术

Python NLTK 朴素贝叶斯分类器：该分类器用于分类输入的底层计算是什么？

发表回复取消回复

相关文章：

Related Posts

L1-L2正则化的不同系数

使用scikit-learn的无监督方法将列表分类成不同组别，有没有办法？

f1_score metric in lightgbm

通过相关系数矩阵进行特征选择

可以将机器学习库用于流式输入和输出吗？

在TensorFlow中，queue.dequeue_up_to()方法的用途是什么？

发表回复 取消回复

发表回复取消回复