任何人有办法将一段文字分词，将每句话放入pandas数据框，并对每句话进行情感分析吗？

初学者级别的NLP/Python程序员。标题已经说明了一切。我基本上需要一段代码来分词一段文字，对每句话进行情感分析，并将每句话及其评分放入pandas数据框中。我已经有可以分词一段文字并进行情感分析的代码，但我在将两者放入数据框时遇到了困难。到目前为止，我有：

我使用newspaper3k来提取URL和文本内容。

from newspaper import fulltextimport requestsurl = "https://www.click2houston.com/news/local/2021/06/18/houston-water-wastewater-proposed-increase-this-is-what-mayor-sylvester-turner-wants-you-to-know/"text = fulltext(requests.get(url).text)

然后我使用BERT提取式摘要工具来总结文章文本。

models = Summarizer()result = models(text, min_length=30)full = "".join(result)type(full)

接着我使用nltk将摘要分词成句子。

tokens=sent_tokenize(full)print(type(np.array(tokens)[0]))

最后，我将其放入一个基本的数据框中。

df = pd.DataFrame(np.array(tokens), columns=['sentences'])

我唯一缺少的就是情感分析。我只需要在数据框中实现每句话的情感分析评分（最好是来自BERT的）。

回答：

Huggingface 可以实现你想要的功能

from transformers import pipelinefrom newspaper import fulltextimport requestsimport pandas as pdimport numpy as npurl = "https://www.click2houston.com/news/local/2021/06/18/houston-water-wastewater-proposed-increase-this-is-what-mayor-sylvester-turner-wants-you-to-know/"text = fulltext(requests.get(url).text)texts = [item.strip() for item in text.split('\n')[:10] if item.strip()]summarizer = pipeline("summarization")sentiment_analyser = pipeline('sentiment-analysis')sumerize = lambda text:simmarizer(text, min_length=5, max_length=30)sentiment_analyse = lambda sentiment_analyser:snt(text)df = pd.DataFrame(np.array(texts), columns=['lines'])df['Summarized'] = df.lines.apply(summarizer)df['Sentiment'] = df.lines.apply(sentiment_analyser)print(df.head())

学技术

任何人有办法将一段文字分词，将每句话放入pandas数据框，并对每句话进行情感分析吗？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复