ReLU不一致性/随机行为

我编写了一个简单的神经网络（它应该可以将两个数字相加），并尝试了不同的激活函数，这是我的代码

class Layer:    def __init__(self):        self.inputs = Nonedef forward(self, inputs):    passdef backward(self, error_gradient, lr):    passclass Dense(Layer):    def __init__(self, n_inputs, n_neurons):        self.weights = np.random.randn(n_neurons, n_inputs)        self.biases = np.random.randn(n_neurons, 1)        super().__init__()    def forward(self, inputs):        self.inputs = inputs        return np.dot(self.weights, self.inputs) + self.biases    def backward(self, error_gradient, lr):        weight_deriv = np.dot(error_gradient, self.inputs.T)        self.weights -= lr * weight_deriv        self.biases -= lr * self.biases        return np.dot(self.weights.T, error_gradient)class Activation(Layer):    def __init__(self, activation, actiovation_prime):        self.activation = activation        self.activation_prime = actiovation_prime        super().__init__()    def forward(self, inputs):        self.inputs = inputs        return self.activation(self.inputs)    def backward(self, error_gradient, lr):        return np.multiply(error_gradient, self.activation_prime(self.inputs))class Tanh(Activation):    def __init__(self):        super().__init__(lambda x: np.tanh(x), lambda y: 1.0 - (np.tanh(y) ** 2))class ReLU(Activation):    def __init__(self):        super().__init__(lambda x: np.maximum(0, x), lambda y: np.where(y > 0, 1, 0))class Sigmoid(Activation):    def __init__(self):        super().__init__(lambda x: 1.0 / (1 + np.exp(-x)), lambda y: (1.0 / (1 + np.exp(-y))) * (1 - (1.0 / (1 + np.exp(-y)))))def mse(y_pred, y_true):    return np.power(y_true - y_pred, 2)def mse_prime(y_pred, y_true):    return 2 * (y_pred - y_true)def run(nn, inputs):    out = inputs    for layer in nn:        out = layer.forward(out)    return out

这是主程序

if __name__ == '__main__':    X = np.reshape([[0.1, 0.2], [0.5, 0.3], [0.2, 0.4], [0.3, 0.7], [0.5, 0.5], [0.4, 0.3]], (6, 2, 1))    Y = np.reshape([[0.3], [0.8], [0.6], [1.0], [1.0], [0.7]], (6, 1, 1))    epochs, learning_rate = 5000, 0.01    network = [        Dense(2, 4),        ReLU(),        Dense(4, 4),        ReLU(),        Dense(4, 1),        ReLU()    ]    for _ in range(epochs):        epoch_error = 0        for x, y in zip(X, Y):            output = run(network, x)            epoch_error += mse(output, y)            output_gradient = mse_prime(output, y)            for layer in reversed(network):                output_gradient = layer.backward(output_gradient, learning_rate)        epoch_error /= len(X)        print("%d/%d, error = %f" % (_, epochs, epoch_error))    test = np.reshape([0.1, 0.5], (2, 1))    pred = run(network, test)    print("Prediction = %f" % pred[0][0])

我有两个问题：

当使用ReLU以外的激活函数且学习率为0.1时，需要超过10万个周期才能使误差接近零，但仍未达到零，但它是一致的，误差总是在下降，所以第一个问题是，为什么在使用Sigmoid或Tanh时，解决像加两个数字这样简单的任务需要如此多的周期？
当使用ReLU时，误差可以很快降到零，可能在大约5000个周期内，但问题是这并不一致，有时误差永远不会下降，那么为什么会发生这种情况（我认为问题在于权重初始化，但我并不确定），以及为什么当它工作时，与使用其他激活函数相比，误差下降得更快？

回答：

损失不会精确达到零，是因为梯度消失问题。
有时候误差永远不会下降，是因为权重已经达到了局部最小值，这是梯度下降算法经常面临的问题。尝试使用带有动量的SGD（随机梯度下降）来避免陷入局部最小值，并解决这个问题。

学技术

ReLU不一致性/随机行为

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复