如何实现主成分数量的随机搜索优化？

我正在尝试为MLPClassifier实现随机搜索优化。我已经成功优化了’hidden_layer_sizes’、’solver’和’activation’。但我在优化主成分数量时遇到了问题。由于此参数用于创建PCA来转换训练数据集，并且不能作为参数传递给MLPClassifier函数，因此我无法像其他超参数一样将其传递到’parameters’字典中。

# PCA nof_prin_components = 200  # PARAMETER for optimisation in expereimentspca = PCA(n_components=nof_prin_components, whiten=True).fit(X)# applies PCA to the train and test images to calculate the principal components# n_components sets the number of components to keepX_train_pca = pca.transform(X) parameters = {    'hidden_layer_sizes': [150,200,250,300,400],    'solver': ['sgd', 'adam', 'lbfgs'],    'activation': ['relu', 'tanh', 'identity', 'logistics']}#Function that performs the actual hyperparameter tuning to return the best set of parameters and the best scoredef tuning(clf, parameters, iterations, X, y):  randomSearch = RandomizedSearchCV(clf, param_distributions=parameters, n_jobs=-1, n_iter=iterations, cv=6)   #n_jobs=-1 ensures that all the cores of cpu will work  randomSearch.fit(X,y)  params = randomSearch.best_params_  score = randomSearch.best_score_  return params, scoreclf = MLPClassifier(batch_size=256, verbose=True, early_stopping=True)parameters_after_tuning, score_after_tuning = tuning(clf, parameters, 20, X_train_pca, y);print(parameters_after_tuning)print(score_after_tuning)

我尝试使用管道，但不知道如何实现它。

#Sklearn pipelines for number of principle components random searchpca1 = PCA(n_components=100, whiten=True).fit(X)pca2 = PCA(n_components=200, whiten=True).fit(X)pca3 = PCA(n_components=300, whiten=True).fit(X)pca4 = PCA(n_components=400, whiten=True).fit(X)estimators = [('pca 1', pca1), ('pca 2', pca2), ('pca 3', pca3), ('pca 4', pca4)]pipe = Pipeline(estimators)pipe[0]

有什么帮助吗？

回答：

我认为最有效的方法是将MLPClassifier和PCA放入一个Pipeline对象中，并对这个单一管道进行微调。为了定义随机搜索的参数，对于管道的每一步，你只需要指定步骤的名称和其参数，用__（两个下划线）分隔。在你的例子中，它看起来像这样：

pipeline = Pipeline(steps=[    ('pca', PCA(whiten=True)),    ('clf', MLPClassifier(batch_size=256, verbose=True, early_stopping=True))])parameters = {    'pca__n_components': [100, 200, 300, 400],    'clf__hidden_layer_sizes': [150, 200, 250, 300, 400],    'clf__solver': ['sgd', 'adam', 'lbfgs'],    'clf__activation': ['relu', 'tanh', 'identity', 'logistics']}

当传递给RandomizedSearchCV时，整个管道将被微调，因此会返回PCA和MLPClassifier的最佳参数。

parameters_after_tuning, score_after_tuning = tuning(pipeline, parameters, 20, X, y)

请注意，你不再需要将转换后的数据集X_train_pca传递给RandomizedSearchCV，而是传递X，因为转换现在是管道的一部分。

有关如何使用Pipeline的更多信息，请参阅其文档和用户指南。

学技术

如何实现主成分数量的随机搜索优化？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复