为什么TPOT推荐的分类器得分低于LinearSVC?

我发现LinearSVC包含在TPOT分类器中，我一直在用它来构建我的模型，并且得到了相当不错的分数（在sklearn评分中是0.95）。

def process(stock):  df = format_data(stock)  df[['HSI Volume', 'HSI', stock]] = df[['HSI Volume', 'HSI', stock]].pct_change()# 将未来值移至当前日期  df[stock+'_future'] = df[stock].shift(-1)  df.replace([-np.inf, np.inf], np.nan, inplace=True)  df.dropna(inplace=True)  df['class'] = list(map(create_labels, df[stock], df[stock+'_future']))  X = np.array(df.drop(['class', stock+'_future'], 1)) # 1 = 列  # X = preprocessing.scale(X)  y = np.array(df['class'])  X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)  tpot = TPOTClassifier(generations = 10, verbosity=2)  fitting = tpot.fit(X_train, y_train)  prediction = tpot.score(X_test, y_test)  tpot.export('pipeline.py')  return fitting, prediction

经过十代后：TPOT推荐了GaussianNB，它在sklearn评分中得分约为0.77。

第1代 - 当前最佳内部交叉验证分数: 0.5322255571                     第2代 - 当前最佳内部交叉验证分数: 0.55453535828                    第3代 - 当前最佳内部交叉验证分数: 0.55453535828                    第4代 - 当前最佳内部交叉验证分数: 0.55453535828                    第5代 - 当前最佳内部交叉验证分数: 0.587469903893                   第6代 - 当前最佳内部交叉验证分数: 0.587469903893                   第7代 - 当前最佳内部交叉验证分数: 0.597194474469                   第8代 - 当前最佳内部交叉验证分数: 0.597194474469                   第9代 - 当前最佳内部交叉验证分数: 0.597194474469                   第10代 - 当前最佳内部交叉验证分数: 0.597194474469                  最佳管道: GaussianNB(RBFSampler(input_matrix, 0.22))(None, 0.54637855142056824)

我只是好奇为什么LinearSVC的得分更高，但TPOT却没有推荐它。是因为评分机制不同，从而导致了不同的最优分类器吗？

非常感谢！

回答：

我个人猜测是tpot陷入了局部最大值，也许尝试改变测试集大小、增加代数或者对数据进行缩放会有帮助。另外，你能重新运行TPOT看看是否得到相同的结果吗？（我的猜测是否定的，因为遗传优化由于变异是非确定性的）

学技术

为什么TPOT推荐的分类器得分低于LinearSVC?

发表回复取消回复

相关文章：

Related Posts

Keras Dense层输入未被展平

无法将分类变量输入随机森林

如何在Keras中对每个输出应用Sigmoid函数？

如何选择类概率的最佳阈值？

在Keras中使用深度学习得到不同的结果

‘MatMul’操作的输入’b’类型为float32，与参数’a’的类型float64不匹配

发表回复 取消回复

发表回复取消回复