我创建了一个管道,主要是循环遍历模型和缩放器,并执行递归特征消除(RFE),如下所示:
def train_models(models, scalers, X_train, y_train, X_val, y_val): best_results = {'f1_score': 0} for model in models: for scaler in scalers: for n_features in list(range( len(X_train.columns), int(len(X_train.columns)/2), -10 )): rfe = RFE( estimator=model, n_features_to_select=n_features, step=10 ) pipe = Pipeline([ ('scaler', scaler), ('selector', rfe), ('model', model) ]) pipe.fit(X_train, y_train) y_pred = pipe.predict(X_val) results = evaluate(y_val, y_pred) #返回一个值的字典 results['pipeline'] = pipe results['y_pred'] = y_pred if results['f1_score'] > best_results['f1_score']: best_results = results print("最佳F1分数: {}".format(best_results['f1_score'])) return best_results
这个管道在函数内部运行良好,能够正确地预测和评分结果。
然而,当我在函数外部调用pipeline.predict()时,例如
best_result = train_models(models, scalers, X_train, y_train, X_val, y_val)pipeline = best_result['pipeline']pipeline.predict(X_val)
以下是pipeline
的外观:
Pipeline(steps=[('scaler', StandardScaler()), ('selector', RFE(estimator=LogisticRegression(C=1, max_iter=1000, penalty='l1', solver='liblinear'), n_features_to_select=78, step=10)), ('model', LogisticRegression(C=1, max_iter=1000, penalty='l1', solver='liblinear'))])
我猜测管道中的model
期望48个特征而不是78个,但我不知道48这个数字从何而来,因为在之前的RFE步骤中n_features_to_select
被设置为78!
任何帮助将不胜感激!
回答:
我没有你的数据。但根据你分享的信息进行一些计算和猜测,48似乎是你嵌套循环尝试的最后一个n_features
。这让我怀疑罪魁祸首是浅拷贝。我建议你将以下内容改为:
pipe = Pipeline([ ('scaler', scaler), ('selector', rfe), ('model', model) ])
改为
pipe = Pipeline([ ('scaler', scaler), ('selector', rfe), ('model', copy.deepcopy(model)) ])
然后再试一次(当然,首先要做一个import copy
)。