Home IT技术 NLP – 仅在单词开头和结尾分离标点符号

NLP – 仅在单词开头和结尾分离标点符号

IT技术 xiaolong · 2025年5月26日 · 0 Comment

我在学习NLP，并且尝试进行基本的预处理步骤。我正在尝试将标点符号与单词的开头和结尾分开，以便用于嵌入。在这样做的时候，我不想破坏像can't、I'm等单词，因为我会单独处理它们。

s = 'This is what I'm trying to do, but I can't figure out how.'

期望的输出：

s_separated = 'This is what I'm trying to do , but I can't figure out how .'

回答：

可以尝试以下方法：

import re
str = "This is what I'm trying to do, but I can't figure out how."
res = re.sub(r'(?<=\w)(?=[,.!;:])', ' ', str)
print res

演示与解释

data-science machine-learning nlp python regex

发表回复取消回复