在实现FAMD和SMOTENC的imblearn管道时遇到AttributeError

我在尝试实现一个包含FAMD、SMOTENC和其他预处理步骤的管道。然而每次都会报错。如果我从管道中移除FAMD，它就能正常工作。

我的代码如下：

#Seperate the dataset in two partsnum_df= X_train_new.select_dtypes(include=[np.number]).columnscat_df= X_train_new.select_dtypes(exclude=[np.number]).columns#Create a mask for categorical featurescategorical_feature_mask = X_train_new.dtypes == objectprint(categorical_feature_mask)from sklearn.pipeline import make_pipelinefrom sklearn.compose import make_column_transformerfrom sklearn.compose import make_column_selector as selector#Create a pipeline to automate the preprocessing steps and SMOTENC togethernum_pipe = make_pipeline(SimpleImputer(strategy='median'))cat_pipe = make_pipeline(SimpleImputer(strategy='most_frequent'),                          OneHotEncoder(handle_unknown='ignore'))transformer= make_column_transformer((num_pipe, selector(dtype_include='number')),                                      (cat_pipe, selector(dtype_include='object')),n_jobs=2)#Undersampling with SMOTENCfrom imblearn.over_sampling import SMOTENCsmote= SMOTENC(categorical_features=categorical_feature_mask,random_state=99)!pip install princefrom prince import FAMDfamd=FAMD(n_components=4,random_state=99)from imblearn.pipeline import make_pipeline as imb_pipeline#Fit the random forest learnerrf=RandomForestClassifier(n_estimators=300random_state=99)pipe=imb_pipeline(transformer,smote,famd,rf)pipe.fit(X_train_new,y_train_new)print('Training Accuracy:%s'%pipe.score(X_train_new,y_train_new))

错误信息如下：

AttributeError                            Traceback (most recent call last)<ipython-input-24-2b7ea084a318> in <module>()      3 rf=RandomForestClassifier(n_estimators=300,max_features=3,criterion='entropy',random_state=99)      4 pipe=imb_pipeline(transformer,smote,famd,rf)----> 5 pipe.fit(X_train_new,y_train_new)      6 print('Training Accuracy:%s'%pipe.score(X_train_new,y_train_new))6 frames/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py in fit(self, X, y, **fit_params)    235     236         """--> 237         Xt, yt, fit_params = self._fit(X, y, **fit_params)    238         if self._final_estimator is not None:    239             self._final_estimator.fit(Xt, yt, **fit_params)/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py in _fit(self, X, y, **fit_params)    195                     Xt, fitted_transformer = fit_transform_one_cached(    196                         cloned_transformer, None, Xt, yt,--> 197                         **fit_params_steps[name])    198                 elif hasattr(cloned_transformer, "fit_resample"):    199                     Xt, yt, fitted_transformer = fit_resample_one_cached(/usr/local/lib/python3.7/dist-packages/joblib/memory.py in __call__(self, *args, **kwargs)    350     351     def __call__(self, *args, **kwargs):--> 352         return self.func(*args, **kwargs)    353     354     def call_and_shelve(self, *args, **kwargs):/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py in _fit_transform_one(transformer, weight, X, y, **fit_params)    564 def _fit_transform_one(transformer, weight, X, y, **fit_params):    565     if hasattr(transformer, 'fit_transform'):--> 566         res = transformer.fit_transform(X, y, **fit_params)    567     else:    568         res = transformer.fit(X, y, **fit_params).transform(X)/usr/local/lib/python3.7/dist-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)    572         else:    573             # fit method of arity 2 (supervised transformation)--> 574             return self.fit(X, y, **fit_params).transform(X)    575     576 /usr/local/lib/python3.7/dist-packages/prince/famd.py in fit(self, X, y)     27      28         # Separate numerical columns from categorical columns---> 29         num_cols = X.select_dtypes(np.number).columns.tolist()     30         cat_cols = list(set(X.columns) - set(num_cols))     31 /usr/local/lib/python3.7/dist-packages/scipy/sparse/base.py in __getattr__(self, attr)    689             return self.getnnz()    690         else:--> 691             raise AttributeError(attr + " not found")    692     693     def transpose(self, axes=None, copy=False):AttributeError: select_dtypes not found

回答：

简而言之：尝试在你的OneHotEncoder中添加sparse=False。考虑向prince提交一个问题报告，以处理稀疏输入。

从错误跟踪中可以看出，问题在于FAMD.fit尝试使用X.select_dtypes来分离分类和数值数据。select_dtypes是一个pandas函数，因此我通常会认为prince是为操作数据框而设计的，而不是sklearn内部使用的numpy数组（如果需要的话，会从数据框转换）。然而，查看源代码，在那行代码的几行上面，他们确实将numpy数组转换为了数据框。但是，最后的跟踪消息来自于scipy。这表明你的X可能实际上是一个稀疏数组。确实，OneHotEncoder（在你的管道中更早的部分）倾向于输出稀疏数组，而ColumnTransformer会根据其组成部分和参数sparse_threshold来决定是否转换为稀疏或密集格式。

学技术

在实现FAMD和SMOTENC的imblearn管道时遇到AttributeError

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复