为什么pandas分类DataFrame会导致真值错误？

我的数据包含一个名为’Married’的列，其值为分类数据“是”或“不是”。我将其更改为数值类型：

 train['Married']=train['Married'].astype('category') train['Married'].cat.categories=[0,1]

现在我使用以下代码来填补缺失值：

train['Married']=train['Married'].fillna(train['Married'].mode())

这导致了以下错误：

 ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

能有人解释一下为什么吗？

回答：

该错误表明您在numpy数组或pandas系列上使用了基本Python中的逻辑运算符，如not, and, or：

例如：

s = pd.Series([1,1,2,2])not pd.isnull(s.mode())

会导致相同的错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

如果您查看堆栈跟踪，错误来自于这一行：

fillna(self, value, method, limit)   1465         else:   1466 -> 1467             if not isnull(value) and value not in self.categories:   1468                 raise ValueError("fill value must be in categories")   1469

因此，它正在检查您尝试填充的值是否在类别中；这一行要求值必须是标量，以便与not和and兼容；然而，series.mode()总是返回一个系列，这导致这一行失败，尝试从mode()中提取值并填充它：

train['Married']=train['Married'].fillna(train['Married'].mode().iloc[0])

一个工作示例：

s = pd.Series(["YES", "NO", "YES", "YES", None])    s1 = s.astype('category')s1.cat.categories = [0, 1]s1#0    1.0#1    0.0#2    1.0#3    1.0#4    NaN#dtype: category#Categories (2, int64): [0, 1]s1.fillna(s1.mode().iloc[0])#0    1#1    0#2    1#3    1#4    1#dtype: category#Categories (2, int64): [0, 1]

学技术

为什么pandas分类DataFrame会导致真值错误？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复