假设我已经对数据框中的句子进行了分词,如下所示:
+-----------------------------------------+-----------+| sentence | sentiment |+-----------------------------------------+-----------+| [i, like, this, app, it, s, awesome] | positive || [way, to, many, ads, pop, up, hate, it] | negative || [ye] | negative || [p] | positive || [niceeeee] | positive || [i, do, not, like, the, design] | negative || [very, useful, recommended] | positive || [ugly] | negative || [xxx] | negative || [yes] | positive |+-----------------------------------------+-----------+
我想通过删除句子列中字符数少于4的行来清理数据框中的不必要数据,最终结果将如下所示:
+-----------------------------------------+-----------+| sentence | sentiment |+-----------------------------------------+-----------+| [i, like, this, app, it, s, awesome] | positive || [way, to, many, ads, pop, up, hate, it] | negative || [niceeeee] | positive || [i, do, not, like, the, design] | negative || [very, useful, recommended] | positive || [ugly] | negative |+-----------------------------------------+-----------+
有谁能提供解决这个问题的程序代码吗?我非常感谢您的帮助,这将有助于我的论文工作,谢谢您的关注
回答:
你可以使用apply
函数来实现这一点
char_limit=4df[df['sentence'].apply(lambda x : len("".join(x))>=char_limit)]