我想在一个包含文本文章的数据集的列上应用一个循环函数。这些文章是用阿拉伯语写的,所以我想移除所有符号和整个英文字母表。这是对阿拉伯文章进行的文本清理工作。
我创建了一个循环来用空值替换特定字符。当我尝试应用它时,我得到了一个错误,错误信息是AttributeError: ‘float’ object has no attribute ‘replace’
这是带有错误截图的代码:
var = d['Text']def cleaning(var): to_delete_characters = "1234567890abcdefghijklmnopqrstuvwxyz“”ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()_+-""=\|/?.,><;]:[؟،" for character in to_delete_characters: var = var.replace(character, "") return varvar.apply(func_name)
错误截图链接:输入图片描述
数据集截图链接:输入图片描述
回答:
为什么不直接使用这行代码:
to_delete_characters = "1234567890abcdefghijklmnopqrstuvwxyz“”ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()_+-""=\|/?.,><;]:[؟،"var = d['Text'].astype(str).str.replace('|'.join(to_delete_characters)))
或者尝试:
var = d['Text'].astype(str)def cleaning(var): to_delete_characters = "1234567890abcdefghijklmnopqrstuvwxyz“”ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()_+-""=\|/?.,><;]:[؟،" for character in to_delete_characters: var = var.replace(character, "") return varvar.apply(func_name)
编辑:
要移除多余的空格,可以尝试:
to_delete_characters = "1234567890abcdefghijklmnopqrstuvwxyz“”ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()_+-""=\|/?.,><;]:[؟،"var = d['Text'].astype(str).str.replace('|'.join(to_delete_characters))).str.replace('\s+', ' ')
或者尝试:
import red['Text'] = d['Text'].astype(str)def cleaning(var): to_delete_characters = "1234567890abcdefghijklmnopqrstuvwxyz“”ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()_+-""=\|/?.,><;]:[؟،" for character in to_delete_characters: var = re.sub('\s+', ' ', var.replace(character, "")) return vard['Text'] = d['Text'].apply(func_name)