支持向量回归（SVR）管道的无效参数

我有一个包含100列连续特征和一个连续标签的数据集，我希望运行SVR；提取相关特征，调整超参数，然后对适合我数据的模型进行交叉验证。

我编写了以下代码：

X_train, X_test, y_train, y_test = train_test_split(scaled_df, target, test_size=0.2)    cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)# 定义要评估的管道model = SVR()fs = SelectKBest(score_func=mutual_info_regression)pipeline = Pipeline(steps=[('sel',fs), ('svr', model)])# 定义网格grid = dict()# 尝试的特征数量grid['estimator__sel__k'] = [i for i in range(1, X_train.shape[1]+1)]# 定义网格搜索#search = GridSearchCV(pipeline, grid, scoring='neg_mean_squared_error', n_jobs=-1, cv=cv)search = GridSearchCV(        pipeline,#        estimator=SVR(kernel='rbf'),        param_grid={            'estimator__svr__C': [0.1, 1, 10, 100, 1000],            'estimator__svr__epsilon': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10],            'estimator__svr__gamma': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10]        },        scoring='neg_mean_squared_error',        verbose=1,        n_jobs=-1)for param in search.get_params().keys():    print(param)# 执行搜索results = search.fit(X_train, y_train)# 总结最佳结果print('Best MAE: %.3f' % results.best_score_)print('Best Config: %s' % results.best_params_)# 总结所有结果means = results.cv_results_['mean_test_score']params = results.cv_results_['params']for mean, param in zip(means, params):    print(">%.3f with: %r" % (mean, param))

我得到了以下错误：

ValueError: Invalid parameter estimator for estimator Pipeline(memory=None,         steps=[('sel',                 SelectKBest(k=10,                             score_func=<function mutual_info_regression at 0x7fd2ff649cb0>)),                ('svr',                 SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,                     gamma='scale', kernel='rbf', max_iter=-1, shrinking=True,                     tol=0.001, verbose=False))],         verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.

当我按照错误信息的建议打印estimator.get_params().keys()时，得到的是：

cverror_scoreestimator__memoryestimator__stepsestimator__verboseestimator__selestimator__svrestimator__sel__kestimator__sel__score_funcestimator__svr__Cestimator__svr__cache_sizeestimator__svr__coef0estimator__svr__degreeestimator__svr__epsilonestimator__svr__gammaestimator__svr__kernelestimator__svr__max_iterestimator__svr__shrinkingestimator__svr__tolestimator__svr__verboseestimatoriidn_jobsparam_gridpre_dispatchrefitreturn_train_scorescoringverboseFitting 5 folds for each of 405 candidates, totalling 2025 fits

但是当我将这一行：

pipeline = Pipeline(steps=[('sel',fs), ('svr', model)])

改为：

pipeline = Pipeline(steps=[('estimator__sel',fs), ('estimator__svr', model)])

我得到了以下错误：

ValueError: Estimator names must not contain __: got ['estimator__sel', 'estimator__svr']

能有人解释一下我哪里做错了，即如何将管道/特征选择步骤与GridSearchCV结合起来？

作为旁注，如果我在GridSearchCV中注释掉pipeline，并取消注释estimator=SVR(kernal='rbf')，这个单元格可以无问题地运行，但在这种情况下，我认为我没有包含特征选择，因为它在任何地方都没有被调用。我之前看到了一些类似的Stack Overflow问题，例如这里，但它们似乎没有回答这个具体问题。

有没有更简洁的方式来编写这个？

回答：

第一个错误消息是关于pipeline参数的，而不是search参数的，并且表明你的param_grid有问题，而不是管道步骤名称。运行pipeline.get_params().keys()应该会显示正确的参数名称。你的网格应该是：

        param_grid={            'svr__C': [0.1, 1, 10, 100, 1000],            'svr__epsilon': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10],            'svr__gamma': [0.0001, 0.0005,  0.001, 0.005,  0.01, 0.05, 1, 5, 10]        },

我不知道用普通的SVR替换管道是如何运行的；你的参数网格在那里也没有指定正确的东西…

学技术

支持向量回归（SVR）管道的无效参数

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复