在尝试链接估计器并查看时遇到错误。我是Python新手,这是我第一次尝试使用这个pipeline函数。
from sklearn.pipeline import Pipelinefrom sklearn.linear_model import LogisticRegressionfrom sklearn.linear_model import LinearRegressionfrom sklearn.decomposition import PCAestimator=[('dim_reduction',PCA()),('logres_model',LogisticRegression()),('linear_model',LinearRegression())]pipeline_estimator=Pipeline(estimator)
错误信息
TypeError Traceback (most recent call last)<ipython-input-196-44549764413a> in <module>----> 1 pipeline_estimator=Pipeline(estimator)D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 71 FutureWarning) 72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})---> 73 return f(**kwargs) 74 return inner_f 75 D:\Anaconda\lib\site-packages\sklearn\pipeline.py in __init__(self, steps, memory, verbose) 112 self.memory = memory 113 self.verbose = verbose--> 114 self._validate_steps() 115 116 def get_params(self, deep=True):D:\Anaconda\lib\site-packages\sklearn\pipeline.py in _validate_steps(self) 157 if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not 158 hasattr(t, "transform")):--> 159 raise TypeError("All intermediate steps should be " 160 "transformers and implement fit and transform " 161 "or be the string 'passthrough' "TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression()' (type <class 'sklearn.linear_model._logistic.LogisticRegression'>) doesn't
回答:
正如错误提示所示,Pipeline
中的所有中间步骤必须是变换器(用于特征变换)并具有fit/transform
方法,但您已经链接了两个模型。您应该只在pipeline的末尾放置一个模型。
看起来您可能想执行网格搜索,比较两个估计器及其相应的pipeline和超参数调整。为此,请使用GridSearchCV
,并将定义的Pipeline
作为估计器:
from sklearn.pipeline import Pipelinefrom sklearn.linear_model import LogisticRegressionfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.decomposition import PCAfrom sklearn.model_selection import GridSearchCV, train_test_splitfrom sklearn.datasets import load_irispipeline = Pipeline([ ('dim_reduction', PCA()), ('clf', LogisticRegression()),])parameters = [ { 'clf': (LogisticRegression(),), 'clf__C': (0.001,0.01,0.1,1,10,100) }, { 'clf': (RandomForestClassifier(),), 'clf__n_estimators': (10, 30), }]grid_search = GridSearchCV(pipeline, parameters)# 一些示例数据集X, y = load_iris(return_X_y=True)X_train, X_tes, y_train, y_test = train_test_split(X, y)grid_search.fit(X_train, y_train)
另外请注意,您混合了分类器和回归器。上面展示了如何通过结合两个示例分类器来实现这一点。不过,您可能需要花些时间了解您面临的是哪种问题,以及哪些模型适合这种问题。