使用多个估计器的Sklearn Pipeline

在尝试链接估计器并查看时遇到错误。我是Python新手，这是我第一次尝试使用这个pipeline函数。

from sklearn.pipeline import Pipelinefrom sklearn.linear_model import LogisticRegressionfrom sklearn.linear_model import LinearRegressionfrom sklearn.decomposition import PCAestimator=[('dim_reduction',PCA()),('logres_model',LogisticRegression()),('linear_model',LinearRegression())]pipeline_estimator=Pipeline(estimator)

错误信息

TypeError                                 Traceback (most recent call last)<ipython-input-196-44549764413a> in <module>----> 1 pipeline_estimator=Pipeline(estimator)D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)     71                           FutureWarning)     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})---> 73         return f(**kwargs)     74     return inner_f     75 D:\Anaconda\lib\site-packages\sklearn\pipeline.py in __init__(self, steps, memory, verbose)    112         self.memory = memory    113         self.verbose = verbose--> 114         self._validate_steps()    115     116     def get_params(self, deep=True):D:\Anaconda\lib\site-packages\sklearn\pipeline.py in _validate_steps(self)    157             if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not    158                     hasattr(t, "transform")):--> 159                 raise TypeError("All intermediate steps should be "    160                                 "transformers and implement fit and transform "    161                                 "or be the string 'passthrough' "TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression()' (type <class 'sklearn.linear_model._logistic.LogisticRegression'>) doesn't

回答：

正如错误提示所示，Pipeline中的所有中间步骤必须是变换器（用于特征变换）并具有fit/transform方法，但您已经链接了两个模型。您应该只在pipeline的末尾放置一个模型。

看起来您可能想执行网格搜索，比较两个估计器及其相应的pipeline和超参数调整。为此，请使用GridSearchCV，并将定义的Pipeline作为估计器：

from sklearn.pipeline import Pipelinefrom sklearn.linear_model import LogisticRegressionfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.decomposition import PCAfrom sklearn.model_selection import GridSearchCV, train_test_splitfrom sklearn.datasets import load_irispipeline = Pipeline([    ('dim_reduction', PCA()),    ('clf', LogisticRegression()),])parameters = [    {        'clf': (LogisticRegression(),),        'clf__C': (0.001,0.01,0.1,1,10,100)    }, {        'clf': (RandomForestClassifier(),),        'clf__n_estimators': (10, 30),    }]grid_search = GridSearchCV(pipeline, parameters)# 一些示例数据集X, y = load_iris(return_X_y=True)X_train, X_tes, y_train, y_test = train_test_split(X, y)grid_search.fit(X_train, y_train)

另外请注意，您混合了分类器和回归器。上面展示了如何通过结合两个示例分类器来实现这一点。不过，您可能需要花些时间了解您面临的是哪种问题，以及哪些模型适合这种问题。

学技术

使用多个估计器的Sklearn Pipeline

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复