尝试拟合机器学习模型时出现错误 AttributeError: ‘bool’ object has no attribute ‘transpose’

我正在尝试创建一个机器学习模型来预测谁会在泰坦尼克号上幸存。每当我尝试拟合我的模型时,我都会得到这个错误:

    coordinates = np.where(mask.transpose())[::-1]AttributeError: 'bool' object has no attribute 'transpose'

我运行的代码如下:

from xgboost import XGBClassifierfrom sklearn.preprocessing import OneHotEncoderfrom sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipelinefrom sklearn.impute import SimpleImputerfrom sklearn.feature_selection import SelectFromModelfrom itertools import combinationsimport pandas as pd import numpy as np#read in datatraining_data = pd.read_csv('train.csv')testing_data = pd.read_csv('test.csv')#seperate X and YX_train_full = training_data.copy()y = X_train_full.SurvivedX_train_full.drop(['Survived'], axis=1, inplace=True)y_test = testing_data#get all str columnscat_columns1 = [cname for cname in X_train_full.columns if                    X_train_full[cname].dtype == "object"]interactions = pd.DataFrame(index= X_train_full)#create new featuresfor combination in combinations(cat_columns1,2):    imputer = SimpleImputer(strategy='constant')    new_col_name = '_'.join(combination)    col1 = X_train_full[combination[0]]    col2 = X_train_full[combination[1]]    col1 = np.array(col1).reshape(-1,1)    col2 = np.array(col2).reshape(-1,1)    col1 = imputer.fit_transform(col1)    col2 = imputer.fit_transform(col2)    new_vals = col1 + '_' + col2    OneHot = OneHotEncoder()    interactions[new_col_name] = OneHot.fit_transform(new_vals) interactions = interactions.reset_index(drop = True)#create new dataframe with new features includednew_df = X_train_full.join(interactions) #do the same for the test fileinteractions2 = pd.DataFrame(index= y_test)for combination in combinations(cat_columns1,2):    imputer = SimpleImputer(strategy='constant')    new_col_name = '_'.join(combination)    col1 = y_test[combination[0]]    col2 = y_test[combination[1]]    col1 = np.array(col1).reshape(-1,1)    col2 = np.array(col2).reshape(-1,1)    col1 = imputer.fit_transform(col1)    col2 = imputer.fit_transform(col2)    new_vals = col1 + '_' + col2    OneHot = OneHotEncoder()    interactions2[new_col_name] = OneHot.fit_transform(new_vals)    interactions2[new_col_name] = new_vals interactions2 = interactions2.reset_index(drop = True)y_test = y_test.join(interactions2)#get names of cat columns (with new features added)cat_columns = [cname for cname in new_df.columns if                    new_df[cname].dtype == "object"]# Select numerical columnsnum_columns = [cname for cname in new_df.columns if                 new_df[cname].dtype in ['int64', 'float64']]#set up pipelinenumerical_transformer = SimpleImputer(strategy = 'constant')categorical_transformer = Pipeline(steps=[    ('imputer', SimpleImputer(strategy='constant')),    ('onehot', OneHotEncoder(handle_unknown='ignore'))])preprocessor = ColumnTransformer(    transformers=[        ('num', numerical_transformer, num_columns),        ('cat', categorical_transformer, cat_columns)    ])model = XGBClassifier()my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),                              ('model', model)                             ])#fit modelmy_pipeline.fit(new_df,y)

我读取的csv文件可以在Kaggle上找到,链接如下:

https://www.kaggle.com/c/titanic/data

我无法找出导致这个问题的具体原因。任何帮助将不胜感激。


回答:

这可能是因为你的数据中包含pd.NA值。pd.NA是在pandas 1.0.0中引入的,但仍被标记为实验性功能。

SimpleImputer最终会运行data == np.nan,这通常会返回一个numpy数组。然而,当data包含pd.NA值时,它返回的是一个单一的布尔标量值。

一个例子:

import pandas as pdimport numpy as nptest_pd_na = pd.DataFrame({"A": [1, 2, 3, pd.NA]})test_np_nan = pd.DataFrame({"A": [1, 2, 3, np.nan]})test_np_nan.to_numpy() == np.nan:> array([[False],       [False],       [False],       [False]])test_pd_na.to_numpy() == np.nan> False

解决方案是,在运行SimpleImputer之前,将所有pd.NA值转换为np.nan。你可以对数据框使用.replace({pd.NA: np.nan})来实现这一点。其缺点显然是你会失去pd.NA带来的好处,例如可以包含缺失数据的整数列,而不是这些列被转换为浮点数列。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注