如何调整算法参数以获得更好的性能？

我对一组推文运行了多项式和伯努利朴素贝叶斯以及线性支持向量机分类器。在1000条训练推文的60/40分割上，它们的表现分别为80%、80%和90%，效果不错。

每个算法都有可以调整的参数，我想知道通过改变这些参数是否能获得更好的结果。我对机器学习的了解仅限于训练、测试和预测，所以我想请教一下，哪些参数是我可以调整的？

这是我使用的代码：

import codecsfrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.naive_bayes import MultinomialNB,BernoulliNBfrom sklearn import svmtrainfile = 'training_words.txt'testfile = 'testing_words.txt'word_vectorizer = CountVectorizer(analyzer='word')  trainset = word_vectorizer.fit_transform(codecs.open(trainfile,'r','utf8'))tags = training_labelsmnb = svm.LinearSVC() #Or any other classifiermnb.fit(trainset, tags)codecs.open(testfile,'r','utf8')testset = word_vectorizer.transform(codecs.open(testfile,'r','utf8'))results = mnb.predict(testset)print results

回答：

你可以使用网格搜索交叉验证来调整模型参数，并使用分层K折交叉验证分割。这里是一个示例代码。

import codecsfrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.naive_bayes import MultinomialNB,BernoulliNBfrom sklearn import svmfrom sklearn.grid_search import GridSearchCVtestfile = 'testing_words.txt'word_vectorizer = CountVectorizer(analyzer='word')  trainset = word_vectorizer.fit_transform(codecs.open(trainfile,'r','utf8'))tags = training_labelsmnb = svm.LinearSVC() # or any other classifier# check out the sklearn online docs to see what params choice we have for your# particular choice of estimator, for SVM, C, class_weight are important ones to tune params_space = {'C': np.logspace(-5, 0, 10), 'class_weight':[None, 'auto']}# build a grid search cv, n_jobs=-1 to use all your processor coresgscv = GridSearchCV(mnb, params_space, cv=10, n_jobs=-1)# fit the modelgscv.fit(trainset, tags)# give a look at your best params combination and best score you havegscv.best_estimator_gscv.best_params_gscv.best_score_codecs.open(testfile,'r','utf8')testset = word_vectorizer.transform(codecs.open(testfile,'r','utf8'))results = gscv.predict(testset)print results

学技术

如何调整算法参数以获得更好的性能？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复