AttributeError: ‘int’ object has no attribute ‘lower’ 在 TFIDF 和 CountVectorizer 中

我尝试预测入口消息的不同类别，并在波斯语上进行了工作。我使用了 Tfidf 和 Naive-Bayes 来分类我的输入数据。以下是我的代码：

但是当我运行上述代码时，它抛出了以下异常，而我期望输出“ads”类别：

Traceback (most recent call last): File “…/multiclass-main.py”, line 27, in X_train_counts=cv.fit_transform(X_train) File “…\sklearn\feature_extraction\text.py”, line 1012, in fit_transform self.fixed_vocabulary_) File “…sklearn\feature_extraction\text.py”, line 922, in _count_vocab for feature in analyze(doc): File “…sklearn\feature_extraction\text.py”, line 308, in tokenize(preprocess(self.decode(doc))), stop_words) File “…sklearn\feature_extraction\text.py”, line 256, in return lambda x: strip_accents(x.lower()) AttributeError: ‘int’ object has no attribute ‘lower’

如何在这个项目中使用 Tfidf 和 CountVectorizer？

回答：

如您所见，错误是 AttributeError: 'int' object has no attribute 'lower'，这意味着整数无法进行小写转换。在您的代码中的某个地方，它试图将整数对象转换为小写，这是不可行的。

为什么会发生这种情况？

CountVectorizer 构造函数有一个参数 lowercase，默认值为 True。当您调用 .fit_transform() 时，它会尝试将您的输入转换为小写，其中包含一个整数。更具体地说，在您的输入数据中，有一个项目是整数对象。例如，您的列表包含类似这样的数据：

 corpus = ['sentence1', 'sentence 2', 12930, 'sentence 100']

当您将上述列表传递给 CountVectorizer 时，它会抛出这样的异常。

如何修复它？

以下是一些可能的解决方案，以避免这个问题：

1) 将您语料库中的所有行转换为字符串对象。

 corpus = ['sentence1', 'sentence 2', 12930, 'sentence 100'] corpus = [str (item) for item in corpus]

2) 从您的语料库中移除整数：

corpus = ['sentence1', 'sentence 2', 12930, 'sentence 100']corpus = [item for item in corpus if not isinstance(item, int)]

学技术

AttributeError: ‘int’ object has no attribute ‘lower’ 在 TFIDF 和 CountVectorizer 中

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复