sklearn分类器管道中需要的“有效列规范”是什么?

目标: 使用sklearn基于整数和对象类型特征预测结果。

我使用了来自Kaggle的以下数据集: 足球数据集

这是我的笔记本: Kaggle笔记本

  • scikit-learn == 0.22.1

我创建了一个几乎可以工作的管道:

import pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipelinefrom sklearn.impute import SimpleImputerfrom sklearn.preprocessing import OneHotEncoder, StandardScalerfrom sklearn.ensemble import RandomForestClassifier# 读取数据df = total_df.copy()# 删除缺少目标的行df.dropna(axis=0, subset=['result'], inplace=True)# 从预测变量中分离目标y = df.result         X = df.drop(['result'], axis=1)# 从训练数据中分离验证集X_train_full, X_test_full, y_train, y_test = train_test_split(X, y,                                                                train_size=0.8,                                                                test_size=0.2,                                                                random_state=0)integer_features = list(X.columns[X.dtypes == 'int64'])#continuous_features = list(X.columns[X.dtypes == 'float64'])categorical_features = list(X.columns[X.dtypes == 'object'])# 只保留选定的列my_cols = categorical_features + integer_featuresX_train = X_train_full[my_cols].copy()X_test = X_test_full[my_cols].copy()integer_transformer = Pipeline(steps = [   ('imputer', SimpleImputer(strategy = 'most_frequent')),   ('scaler', StandardScaler())])categorical_transformer = Pipeline(steps=[    ('imputer', SimpleImputer(strategy='most_frequent')),    ('onehot', OneHotEncoder(handle_unknown='ignore'))])preprocessor = ColumnTransformer(   transformers=[       ('ints', integer_transformer, integer_features),       ('cat', categorical_transformer, categorical_features)])base = Pipeline(steps=[('preprocessor', preprocessor),                     ('classifier', RandomForestClassifier())])# 对训练数据进行预处理,拟合模型 base.fit(X_train, y_train)

我收到了一个错误:

ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

这是完整的回溯信息:

---------------------------------------------------------------------------KeyError                                  Traceback (most recent call last)/opt/conda/lib/python3.7/site-packages/sklearn/utils/__init__.py in _determine_key_type(key, accept_slice)    255         try:--> 256             return dtype_to_str[type(key)]    257         except KeyError:KeyError: <class 'sqlalchemy.sql.elements.quoted_name'>During handling of the above exception, another exception occurred:ValueError                                Traceback (most recent call last)<ipython-input-13-702987dff390> in <module>     47      48 # Preprocessing of training data, fit model---> 49 base.fit(X_train, y_train)     50      51 base.predict(X_test)/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)    348             This estimator    349         """--> 350         Xt, fit_params = self._fit(X, y, **fit_params)    351         with _print_elapsed_time('Pipeline',    352                                  self._log_message(len(self.steps) - 1)):/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)    313                 message_clsname='Pipeline',    314                 message=self._log_message(step_idx),--> 315                 **fit_params_steps[name])    316             # Replace the transformer of the step with the fitted    317             # transformer. This is necessary when loading the transformer/opt/conda/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)    353     354     def __call__(self, *args, **kwargs):--> 355         return self.func(*args, **kwargs)    356     357     def call_and_shelve(self, *args, **kwargs):/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)    726     with _print_elapsed_time(message_clsname, message):    727         if hasattr(transformer, 'fit_transform'):--> 728             res = transformer.fit_transform(X, y, **fit_params)    729         else:    730             res = transformer.fit(X, y, **fit_params).transform(X)/opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in fit_transform(self, X, y)    514         self._validate_transformers()    515         self._validate_column_callables(X)--> 516         self._validate_remainder(X)    517     518         result = self._fit_transform(X, y, _fit_transform_one)/opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in _validate_remainder(self, X)    316         if (hasattr(X, 'columns') and    317                 any(_determine_key_type(cols) == 'str'--> 318                     for cols in self._columns)):    319             self._df_columns = X.columns    320 /opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in <genexpr>(.0)    316         if (hasattr(X, 'columns') and    317                 any(_determine_key_type(cols) == 'str'--> 318                     for cols in self._columns)):    319             self._df_columns = X.columns    320 /opt/conda/lib/python3.7/site-packages/sklearn/utils/__init__.py in _determine_key_type(key, accept_slice)    275     if isinstance(key, (list, tuple)):    276         unique_key = set(key)--> 277         key_type = {_determine_key_type(elt) for elt in unique_key}    278         if not key_type:    279             return None/opt/conda/lib/python3.7/site-packages/sklearn/utils/__init__.py in <setcomp>(.0)    275     if isinstance(key, (list, tuple)):    276         unique_key = set(key)--> 277         key_type = {_determine_key_type(elt) for elt in unique_key}    278         if not key_type:    279             return None/opt/conda/lib/python3.7/site-packages/sklearn/utils/__init__.py in _determine_key_type(key, accept_slice)    256             return dtype_to_str[type(key)]    257         except KeyError:--> 258             raise ValueError(err_msg)    259     if isinstance(key, slice):    260         if not accept_slice:ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

任何帮助将不胜感激!

编辑: 错误说明“只允许标量、列表或全部整数或全部字符串的切片,或布尔掩码”。 integer_featurescategorical_features 是包含仅列名称字符串的列表。


回答:

您对integer_features和categorical_features使用了列表,而转换器需要索引类型。

categorical_features = X.select_dtypes(include="object").columnsinteger_features = X.select_dtypes(exclude="object").columns

更改这些,将解决您的错误。 🙂

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注