我们可以找到通过命名实体识别标记的实体周围的句子吗？

我们已经准备好了一个能够识别自定义命名实体的模型。问题是，如果提供整个文档，模型的表现不符合预期，但如果只提供几句话，它就能给出惊人的结果。

我想选择标记实体前后的两句话。

例如，如果文档的一部分包含世界上的城市Colombo（被标记为GPE），我需要选择标记前后的两句话。我尝试了几种方法，但复杂度太高了。

在spacy中是否有内置的方法可以解决这个问题？

我正在使用Python和spacy。

我尝试通过识别标记的索引来解析文档，但这种方法非常慢。

回答：

值得尝试改进自定义命名实体识别器，因为额外的上下文通常不应该影响性能，如果你能解决这个问题，整体效果可能会更好。

然而，关于你具体提到的周围句子的问题：

Token或Span（实体是一个Span）有一个.sent属性，可以给你覆盖该实体的句子作为一个Span。如果你查看给定句子开始/结束标记之前/之后的标记，你可以获取文档中任何标记的前一个/后一个句子。

import spacydef get_previous_sentence(doc, token_index):    if doc[token_index].sent.start - 1 < 0:        return None    return doc[doc[token_index].sent.start - 1].sentdef get_next_sentence(doc, token_index):    if doc[token_index].sent.end + 1 >= len(doc):        return None    return doc[doc[token_index].sent.end + 1].sentnlp = spacy.load('en_core_web_lg')text = "Jane is a name. Here is a sentence. Here is another sentence. Jane was the mayor of Colombo in 2010. Here is another filler sentence. And here is yet another padding sentence without entities. Someone else is the mayor of Colombo right now."doc = nlp(text)for ent in doc.ents:    print(ent, ent.label_, ent.sent)    print("Prev:", get_previous_sentence(doc, ent.start))    print("Next:", get_next_sentence(doc, ent.start))    print("----")

输出：

Jane PERSON Jane is a name.Prev: NoneNext: Here is a sentence.----Jane PERSON Jane was the mayor of Colombo in 2010.Prev: Here is another sentence.Next: Here is another filler sentence.----Colombo GPE Jane was the mayor of Colombo in 2010.Prev: Here is another sentence.Next: Here is another filler sentence.----2010 DATE Jane was the mayor of Colombo in 2010.Prev: Here is another sentence.Next: Here is another filler sentence.----Colombo GPE Someone else is the mayor of Colombo right now.Prev: And here is yet another padding sentence without entities.Next: None----

学技术

我们可以找到通过命名实体识别标记的实体周围的句子吗？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复