假设我在使用GridSearchCV
来搜索超参数,同时也在使用Pipeline
来预处理数据(我认为我需要这样做):
param_grid = { 'svc__gamma': np.linspace(0.2, 1, 5)}pipeline = Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])search = GridSearchCV(pipeline, param_grid, cv=10)search.fit(train_x, train_y)
有没有办法测试我的假设,即包含scaler
步骤实际上是有帮助的(不仅仅是删除它然后重新运行)?
即,是否有办法编写:
param_grid = { 'svc__gamma': np.linspace(0.2, 1, 5), 'scaler': [On, Off]}
或者我应该采取不同的方法来处理这个问题?
回答:
你可以通过在param_grid
中传递passthrough
来实现这一点,如下所示:
param_grid = { 'svc__gamma': np.linspace(0.2, 1, 5), 'scaler': ['passthrough', StandardScaler()]}
如scikit-learn管道文档中所见
也可以将单个步骤作为参数替换,非最终步骤可以通过设置为’passthrough’来忽略:
>>> from sklearn.pipeline import Pipeline>>> from sklearn.svm import SVC>>> from sklearn.decomposition import PCA>>> from sklearn.linear_model import LogisticRegression>>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]>>> pipe = Pipeline(estimators)>>> param_grid = dict(reduce_dim=['passthrough', PCA(5), PCA(10)],... clf=[SVC(), LogisticRegression()],... clf__C=[0.1, 10, 100])>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)