朴素贝叶斯分类器仅基于先验概率做出决策

我正在尝试根据推文的情感将其分类为三个类别（买入、持有、卖出）。我使用的是R语言和e1071包。

我有两个数据框：一个是训练集，另一个是需要预测情感的新推文集。

训练集数据框如下：

   +--------------------------------------------------+   **text | sentiment**   *这只股票值得买入* | Buy   *东京市场崩盘* | Sell   *大家对新产品感到兴奋* | Hold   +--------------------------------------------------+

现在我想使用推文文本trainingset[,2]和情感类别trainingset[,4]来训练模型。

classifier<-naiveBayes(trainingset[,2],as.factor(trainingset[,4]), laplace=1)

通过查看分类器的元素

classifier$tables$x

我发现条件概率已经被计算出来…对于每个推文都有不同的买入、持有和卖出的概率。到目前为止一切顺利。

然而，当我使用以下代码预测训练集时：

predict(classifier, trainingset[,2], type="raw")

我得到的分类结果仅基于先验概率，这意味着每个推文都被分类为持有（因为“持有”在情感中占比最大）。因此，每个推文在买入、持有和卖出的概率都是相同的：

      +--------------------------------------------------+      **Id | Buy | Hold | Sell**      1  |0.25 | 0.5  | 0.25      2  |0.25 | 0.5  | 0.25      3  |0.25 | 0.5  | 0.25     ..  |..... | ....  | ...      N  |0.25 | 0.5  | 0.25     +--------------------------------------------------+

有什么想法能告诉我哪里做错了吗？非常感谢您的帮助！

谢谢

回答：

看起来您是使用整句作为输入来训练模型的，而您似乎希望使用单词作为输入特征。

使用方法：

## S3 method for class 'formula'naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)## Default S3 method:naiveBayes(x, y, laplace = 0, ...)## S3 method for class 'naiveBayes'predict(object, newdata,  type = c("class", "raw"), threshold = 0.001, ...)

参数：

  x: 一个数值矩阵，或包含分类和/或数值变量的数据框。  y: 类别向量。

特别是，如果您这样训练naiveBayes：

x <- c("john likes cake", "marry likes cats and john")y <- as.factor(c("good", "bad")) bayes<-naiveBayes( x,y )

您得到的分类器只能识别这两个句子：

Naive Bayes Classifier for Discrete PredictorsCall:naiveBayes.default(x = x,y = y)A-priori probabilities:y bad good  0.5  0.5 Conditional probabilities:            x      xy      john likes cake marry likes cats and john  bad                0                         1  good               1                         0

要实现一个词级分类器，您需要使用单词作为输入来运行它

x <-             c("john","likes","cake","marry","likes","cats","and","john")y <- as.factors( c("good","good", "good","bad",  "bad",  "bad", "bad","bad") )bayes<-naiveBayes( x,y )

您会得到

Naive Bayes Classifier for Discrete PredictorsCall:naiveBayes.default(x = x,y = y)A-priori probabilities:y bad good  0.625 0.375 Conditional probabilities:      xy            and      cake      cats      john     likes     marry  bad  0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000  good 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000

总的来说，R不太适合处理NLP数据，python（至少Java）会是更好的选择。

要将句子转换为单词，您可以使用strsplit函数

unlist(strsplit("john likes cake"," "))[1] "john"  "likes" "cake"

学技术

朴素贝叶斯分类器仅基于先验概率做出决策

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复