尝试拟合机器学习模型时出现错误 AttributeError: ‘bool’ object has no attribute ‘transpose’

我正在尝试创建一个机器学习模型来预测谁会在泰坦尼克号上幸存。每当我尝试拟合我的模型时，我都会得到这个错误：

    coordinates = np.where(mask.transpose())[::-1]AttributeError: 'bool' object has no attribute 'transpose'

我运行的代码如下：

from xgboost import XGBClassifierfrom sklearn.preprocessing import OneHotEncoderfrom sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipelinefrom sklearn.impute import SimpleImputerfrom sklearn.feature_selection import SelectFromModelfrom itertools import combinationsimport pandas as pd import numpy as np#read in datatraining_data = pd.read_csv('train.csv')testing_data = pd.read_csv('test.csv')#seperate X and YX_train_full = training_data.copy()y = X_train_full.SurvivedX_train_full.drop(['Survived'], axis=1, inplace=True)y_test = testing_data#get all str columnscat_columns1 = [cname for cname in X_train_full.columns if                    X_train_full[cname].dtype == "object"]interactions = pd.DataFrame(index= X_train_full)#create new featuresfor combination in combinations(cat_columns1,2):    imputer = SimpleImputer(strategy='constant')    new_col_name = '_'.join(combination)    col1 = X_train_full[combination[0]]    col2 = X_train_full[combination[1]]    col1 = np.array(col1).reshape(-1,1)    col2 = np.array(col2).reshape(-1,1)    col1 = imputer.fit_transform(col1)    col2 = imputer.fit_transform(col2)    new_vals = col1 + '_' + col2    OneHot = OneHotEncoder()    interactions[new_col_name] = OneHot.fit_transform(new_vals) interactions = interactions.reset_index(drop = True)#create new dataframe with new features includednew_df = X_train_full.join(interactions) #do the same for the test fileinteractions2 = pd.DataFrame(index= y_test)for combination in combinations(cat_columns1,2):    imputer = SimpleImputer(strategy='constant')    new_col_name = '_'.join(combination)    col1 = y_test[combination[0]]    col2 = y_test[combination[1]]    col1 = np.array(col1).reshape(-1,1)    col2 = np.array(col2).reshape(-1,1)    col1 = imputer.fit_transform(col1)    col2 = imputer.fit_transform(col2)    new_vals = col1 + '_' + col2    OneHot = OneHotEncoder()    interactions2[new_col_name] = OneHot.fit_transform(new_vals)    interactions2[new_col_name] = new_vals interactions2 = interactions2.reset_index(drop = True)y_test = y_test.join(interactions2)#get names of cat columns (with new features added)cat_columns = [cname for cname in new_df.columns if                    new_df[cname].dtype == "object"]# Select numerical columnsnum_columns = [cname for cname in new_df.columns if                 new_df[cname].dtype in ['int64', 'float64']]#set up pipelinenumerical_transformer = SimpleImputer(strategy = 'constant')categorical_transformer = Pipeline(steps=[    ('imputer', SimpleImputer(strategy='constant')),    ('onehot', OneHotEncoder(handle_unknown='ignore'))])preprocessor = ColumnTransformer(    transformers=[        ('num', numerical_transformer, num_columns),        ('cat', categorical_transformer, cat_columns)    ])model = XGBClassifier()my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),                              ('model', model)                             ])#fit modelmy_pipeline.fit(new_df,y)

我读取的csv文件可以在Kaggle上找到，链接如下：

https://www.kaggle.com/c/titanic/data

我无法找出导致这个问题的具体原因。任何帮助将不胜感激。

回答：

这可能是因为你的数据中包含pd.NA值。pd.NA是在pandas 1.0.0中引入的，但仍被标记为实验性功能。

SimpleImputer最终会运行data == np.nan，这通常会返回一个numpy数组。然而，当data包含pd.NA值时，它返回的是一个单一的布尔标量值。

一个例子：

import pandas as pdimport numpy as nptest_pd_na = pd.DataFrame({"A": [1, 2, 3, pd.NA]})test_np_nan = pd.DataFrame({"A": [1, 2, 3, np.nan]})test_np_nan.to_numpy() == np.nan:> array([[False],       [False],       [False],       [False]])test_pd_na.to_numpy() == np.nan> False

解决方案是，在运行SimpleImputer之前，将所有pd.NA值转换为np.nan。你可以对数据框使用.replace({pd.NA: np.nan})来实现这一点。其缺点显然是你会失去pd.NA带来的好处，例如可以包含缺失数据的整数列，而不是这些列被转换为浮点数列。

学技术

尝试拟合机器学习模型时出现错误 AttributeError: ‘bool’ object has no attribute ‘transpose’

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复