Scikit Learn – 使用GridSearchCV训练新模型

如果我使用GridSearchCV和一个管道（pipeline）得到了最优参数，有没有办法保存训练好的模型，以便将来我可以将整个管道应用到新数据上并为其生成预测？例如，我有以下管道和参数的网格搜索：

pipeline = Pipeline([    ('vect', CountVectorizer()),    ('tfidf', TfidfTransformer()),    ('clf', OneVsRestClassifier(SVC(probability=True))),])parameters = {    'vect__ngram_range': ((1, 1),(1, 2),(1,3)),  # unigrams or bigrams    'clf__estimator__kernel': ('rbf','linear'),    'clf__estimator__C': tuple([10**i for i in range(-10,11)]),}grid_search = GridSearchCV(pipeline,parameters,n_jobs=-1,verbose=1)print("Performing grid search...")print("pipeline:", [name for name, _ in pipeline.steps])print("parameters:")pprint(parameters)t0 = time()#Conduct the grid searchgrid_search.fit(X,y)print("done in %0.3fs" % (time() - t0))print()print("Best score: %0.3f" % grid_search.best_score_)print("Best parameters set:")#Obtain the top performing parametersbest_parameters = grid_search.best_estimator_.get_params()#Print the resultsfor param_name in sorted(parameters.keys()):    print("\t%s: %r" % (param_name, best_parameters[param_name]))

现在我想将所有这些步骤保存到一个单一流程中，以便我可以将其应用到新的、未见过的数据集上，并且它会使用相同的参数、向量化器和转换器来转换、实现并报告结果？

回答：

你可以直接将GridSearchCV对象进行pickle保存，然后在你想用它预测新数据时再解封（unpickle）它。

import pickle# Fit model and pickle fitted modelgrid_search.fit(X,y)with open('/model/path/model_pickle_file', "w") as fp:    pickle.dump(grid_search, fp)# Load model from filewith open('/model/path/model_pickle_file', "r") as fp:    grid_search_load = pickle.load(fp)# Predict new data with model loaded from disky_new = grid_search_load.best_estimator_.predict(X_new)

学技术

Scikit Learn – 使用GridSearchCV训练新模型

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复