单神经元delta规则在Sigmoid函数上的收敛性

我在尝试使用与逻辑的例子进行delta规则学习，我注意到在权重修正时不使用Sigmoid激活函数的导数时，学习收敛得更快且效果更好。

我使用了偏置神经元。

如果我理解正确，delta规则应该在权重调整时考虑激活函数的导数：Δ Wk(n) = η∗𝑒(𝑛)∗𝑔′(ℎ)∗𝑥(𝑛)。

其中 e(n) = 期望输出 – 神经元输出。

这是我用来计算输出的Sigmoid函数：

public double calc(double sum) {    return 1 / (1 + Math.pow(Math.E, -sum));}

根据delta规则第33页，第4步，权重更新应为：

double delta = learningRate * error * estimated * (1 - estimated) * input;

不使用以下部分时效果更好：

estimated * (1 - estimated)

这是使用delta规则进行训练的代码：

@Overridepublic void train(List<LearningSample> samples, double[] weights, Function<double[], Double> neuronOutput) {    double[] weightDelta = new double[weights.length];    for (int i = 0; i < 10000; i++) {        // Collections.shuffle(samples);        for (LearningSample sample : samples) {            // sigmoid of dot product of weights and input vector, including bias            double estimated = neuronOutput.apply(sample.getInput());            double error = sample.getDesiredOutput() - estimated;            // this commented out version actually works better than the one bellow            // double delta = learningRate * error;            double delta = learningRate * error * estimated * (1 - estimated);            // aggregate delta per weight for each sample in epoch            deltaUpdate(delta, weightDelta, sample.getInput());        }        // batch update weights at the end of training epoch        for (int weight = 0; weight < weights.length; weight++) {            weights[weight] += weightDelta[weight];        }        weightDelta = new double[weights.length];    }      }private void deltaUpdate(double delta, double[] weightsDelta, double[] input) {    for (int feature = 0; feature < input.length; feature++) {        weightsDelta[feature] = weightsDelta[feature] + delta * input[feature];    }}

用于AND逻辑的训练样本如下所示：

List<LearningSample> samples = new ArrayList<>();LearningSample sample1 = new LearningSample(new double[] { 0, 0 }, 0);LearningSample sample2 = new LearningSample(new double[] { 0, 1 }, 0);LearningSample sample3 = new LearningSample(new double[] { 1, 0 }, 0);LearningSample sample4 = new LearningSample(new double[] { 1, 1 }, 1);

偏置1作为构造函数中的第0个组件注入。

学习后测试输出的顺序如下：

System.out.println(neuron.output(new double[] { 1,   1, 1 }));System.out.println(neuron.output(new double[] { 1,   0, 0 }));System.out.println(neuron.output(new double[] { 1,   0, 1 }));System.out.println(neuron.output(new double[] { 1,   1, 0 }));

当我从delta计算中省略Sigmoid的导数时，结果如下：

10000次迭代

0.9666565909058419
2.05087653022386E-5
0.023803593411627456
0.023803593411627456

35000次迭代

0.9903810162649429
4.6475933225663785E-7
0.006870001301253153
0.006870001301253153

应用导数后的结果如下：

10000次迭代

0.8446651307271656
0.004030424878725242
0.129178264332045
0.129178264332045

35000次迭代

0.9218773156128204
4.169603485934177E-4
0.06555977437019253
0.06555977437019253

学习率为：0.021，偏置的初始权重为：-2。

在第一个不使用导数的例子中，误差更小，函数近似效果更好。这是为什么？

更新

根据@Umberto的回答，我有几点想验证：

偶然实验中使用delta = learningRate * error * input实际上是有效的，因为这最小化了交叉熵成本函数？是的
显然交叉熵在分类问题上表现得更好，那么在什么时候应该使用均方误差（MSE）作为成本函数？回归

作为补充说明，我将输出通过阈值函数处理，这里没有显示，所以这是二元分类。

回答：

原因很简单。你在最小化不同的成本函数。在你的例子中（如幻灯片所示），你最小化的是误差的平方。如果你使用我在这里描述的形式的成本函数（交叉熵）github链接，你将获得更快的权重更新。通常在分类问题中（通常你使用Sigmoid神经元进行二元分类），平方误差并不是一个很好的成本函数。

如果你使用交叉熵，你将需要使用learningRate * error * input;（根据你如何定义误差，选择正确的符号）。

作为旁注，你实际上是在做逻辑回归…

希望这对你有帮助。如果你需要更多信息，请告诉我。查看我的链接，那里我对其背后的数学进行了完整的推导。

学技术

单神经元delta规则在Sigmoid函数上的收敛性

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复