如何并行化随机梯度下降？

我有一个相当大的训练矩阵（超过10亿行，每行有两个特征）。有两个类别（0和1）。这对于单台机器来说太大了，但幸运的是我有大约200个MPI主机可用。每个都是一台性能适中的双核工作站。

特征生成已经成功分布式处理了。

Multiprocessing scikit-learn中的回答表明可以分布SGDClassifier的工作：

你可以将数据集分布到各个核心上，执行partial_fit，获取权重向量，进行平均，然后将它们分发给各个估计器，再次执行partial_fit。

当我在每个估计器上第二次运行partial_fit后，我该如何继续以获得最终的聚合估计器？

我的最佳猜测是再次平均coefs和intercepts，并用这些值创建一个估计器。由此产生的估计器给出的结果与在整个数据上使用fit()构建的估计器不同。

细节

每个主机生成一个本地矩阵和一个本地向量。这是测试集的n行和相应的n个目标值。

每个主机使用本地矩阵和本地向量创建一个SGDClassifier并执行partial_fit。然后每个主机将coef向量和intercept发送给根节点。根节点对这些进行平均并将它们发送回主机。主机再次执行partial_fit，并将coef向量和intercept发送给根节点。

根节点用这些值构建一个新的估计器。

local_matrix = get_local_matrix()local_vector = get_local_vector()estimator = linear_model.SGDClassifier()estimator.partial_fit(local_matrix, local_vector, [0,1])comm.send((estimator.coef_,estimator.intersept_),dest=0,tag=rank)average_coefs = Noneavg_intercept = Nonecomm.bcast(0,root=0)if rank > 0:    comm.send( (estimator.coef_, estimator.intercept_ ), dest=0, tag=rank)else:    pairs = [comm.recv(source=r, tag=r) for r in range(1,size)]    pairs.append( (estimator.coef_, estimator.intercept_) )    average_coefs = np.average([ a[0] for a in pairs ],axis=0)    avg_intercept = np.average( [ a[1][0] for a in pairs ] )estimator.coef_ = comm.bcast(average_coefs,root=0)estimator.intercept_ = np.array( [comm.bcast(avg_intercept,root=0)] )estimator.partial_fit(metric_matrix, edges_exist,[0,1])if rank > 0:    comm.send( (estimator.coef_, estimator.intercept_ ), dest=0, tag=rank)else:    pairs = [comm.recv(source=r, tag=r) for r in range(1,size)]    pairs.append( (estimator.coef_, estimator.intercept_) )    average_coefs = np.average([ a[0] for a in pairs ],axis=0)    avg_intercept = np.average( [ a[1][0] for a in pairs ] )    estimator.coef_ = average_coefs    estimator.intercept_ = np.array( [avg_intercept] )    print("The estimator at rank 0 should now be working")

回答：

在具有10亿个样本和2个特征的数据集上训练线性模型，很可能会导致欠拟合或浪费CPU/IO时间，尤其是在数据实际上是线性可分的情况下。不要浪费时间考虑用线性模型并行化这样的问题：

要么切换到更复杂的模型类别（例如，在适合内存的小数据分区上训练随机森林，然后聚合它们）
要么选择数据集的随机子样本，逐渐增加并训练线性模型。在保留的测试集上测量预测准确性，并在看到收益递减时停止（可能在少数类别的几万个样本后）。

学技术

如何并行化随机梯度下降？

细节

发表回复取消回复

细节

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复