我在Pipeline流程中使用自定义类进行predict
时遇到了问题,每个管道步骤都有一个自定义类。
class MyFeatureSelector(): def __init__(self, features=5, method='pca'): self.features = features self.method = method def fit(self, X, Y): return self def transform(self, X, Y=None): try: if self.features < X.shape[1]: if self.method == 'pca': selector = PCA(n_components=self.features) elif self.method == 'rfe': selector = RFE(estimator=LinearRegression(n_jobs=-1), n_features_to_select=self.features, step=1) selector.fit(X, Y) return selector.transform(X) except Exception as err: print('MyFeatureSelector.transform(): {}'.format(err)) return X def fit_transform(self, X, Y=None): self.fit(X, Y) return self.transform(X, Y)model = Pipeline([ ("DATA_CLEANER", MyDataCleaner(demo='', mode='strict')), ("DATA_ENCODING", MyEncoder(encoder_name='code')), ("FEATURE_SELECTION", MyFeatureSelector(features=15, method='rfe')), ("HUBER_MODELLING", HuberRegressor())])
因此,以上代码在这里运行得很好:
model.fit(X, _Y)
但在这里我遇到了错误
prediction = model.predict(XT)
错误:形状(672,107)和(15,)不匹配:107(维度1)!= 15(维度0)
调试显示问题出在这里:selector.fit(X, Y)
,因为在predict()
步骤中创建了MyFeatureSelector
的新实例,此时Y
不存在。
我哪里做错了?
回答:
下面是工作版本的代码:
class MyFeatureSelector(): def __init__(self, features=5, method='pca'): self.features = features self.method = method self.selector = None self.init_selector() def init_selector(): if self.method == 'pca': self.selector = PCA(n_components=self.features) elif self.method == 'rfe': self.selector = RFE(estimator=LinearRegression(n_jobs=-1), n_features_to_select=self.features, step=1) def fit(self, X, Y): return self def transform(self, X, Y=None): try: if self.features < X.shape[1]: if Y is not None: self.selector.fit(X, Y) return selector.transform(X) except Exception as err: print('MyFeatureSelector.transform(): {}'.format(err)) return Xdef fit_transform(self, X, Y=None): self.fit(X, Y) return self.transform(X, Y)