我的神经网络用直线逼近X^2

我目前正在尝试从头开始实现我自己的神经网络,以测试我对这种方法的理解。我以为一切进展顺利,因为我的网络能够轻松地逼近AND和XOR函数,但事实证明它在学习逼近一个简单的平方函数时遇到了问题。

我尝试了各种不同的网络配置,从1到3层不等,每层节点数从1到64。我将学习率从0.1调整到0.00000001,并实施了权重衰减,因为我认为一些正则化可能会提供一些关于问题出在哪里的见解。我还实施了梯度检查,它给出了相互矛盾的答案,因为每次尝试的结果差异很大,从可怕的0.6差异到极好的1e-10。我使用了leaky ReLU激活函数,以及均方误差(MSE)作为我的成本函数。

有人能帮我找出我遗漏了什么吗?还是这完全取决于优化超参数?

我的代码如下:

import matplotlib.pyplot as pltimport numpy as npimport Sub_Script as ss# 使用X**2创建样本数据集X = np.expand_dims(np.linspace(0, 1, 201), axis=0)y = X**2plt.plot(X.T, y.T)# 超参数layer_dims = [1, 64, 1]learning_rate = 0.000001iterations = 50000decay = 0.00000001num_ex = y.shape[1]# 初始化num_layers = len(layer_dims)weights = [None] + [np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1])for l in range(1, num_layers)]biases = [None] + [np.zeros((layer_dims[l], 1)) for l in range(1, num_layers)]dweights, dbiases, dw_approx, db_approx = ss.grad_check(weights, biases, num_layers, X, y, decay, num_ex)# 主函数:迭代循环for iter in range(iterations):# 主函数:前向传播z_values, acts = ss.forward_propagation(weights, biases, num_layers, X)dweights, dbiases = ss.backward_propagation(weights, biases, num_layers, z_values, acts, y)weights, biases = ss.update_paras(weights, biases, dweights, dbiases, learning_rate, decay, num_ex)if iter % (1000+1) == 0:    print('Cost: ', ss.mse(acts[-1], y, weights, decay, num_ex))# 梯度检查dweights, dbiases, dw_approx, db_approx = ss.grad_check(weights, biases, num_layers, X, y, decay, num_ex)# 可视化plt.plot(X.T, acts[-1].T)

包含神经网络函数的Sub_Script.py如下:

import numpy as npimport copy as cp# 构建子函数,前向传播、后向传播以及成本和激活函数# Leaky ReLU激活函数def relu(x):    return (x > 0) * x + (x < 0) * 0.01*x# Leaky ReLU激活函数梯度def relu_grad(x):    return (x > 0) + (x < 0) * 0.01# MSE成本函数def mse(prediction, actual, weights, decay, num_ex):    return np.sum((actual - prediction) ** 2)/(actual.shape[1]) + (decay/(2*num_ex))*np.sum([np.sum(w) for w in weights[1:]])# MSE成本函数梯度 def mse_grad(prediction, actual):    return -2 * (actual - prediction)/(actual.shape[1])# 前向传播def forward_propagation(weights, biases, num_layers, act):    acts = [[None] for i in range(num_layers)]    z_values = [[None] for i in range(num_layers)]    acts[0] = act    for layer in range(1, num_layers):        z_values[layer] = np.dot(weights[layer], acts[layer-1]) + biases[layer]        acts[layer] = relu(z_values[layer])    return z_values, acts# 后向传播def backward_propagation(weights, biases, num_layers, z_values, acts, y):    dweights = [[None] for i in range(num_layers)]    dbiases = [[None] for i in range(num_layers)]    zgrad = mse_grad(acts[-1], y) * relu_grad(z_values[-1])    dweights[-1] = np.dot(zgrad, acts[-2].T)    dbiases[-1] = np.sum(zgrad, axis=1, keepdims=True)    for layer in range(num_layers-2, 0, -1):        zgrad = np.dot(weights[layer+1].T, zgrad) * relu_grad(z_values[layer])        dweights[layer] = np.dot(zgrad, acts[layer-1].T)        dbiases[layer] = np.sum(zgrad, axis=1, keepdims=True)    return dweights, dbiases# 更新参数并进行正则化def update_paras(weights, biases, dweights, dbiases, learning_rate, decay, num_ex):    weights = [None] + [w - learning_rate*(dw + (decay/num_ex)*w) for w, dw in zip(weights[1:], dweights[1:])]    biases = [None] + [b - learning_rate*db for b, db in zip(biases[1:], dbiases[1:])]    return weights, biases# 梯度检查def grad_check(weights, biases, num_layers, X, y, decay, num_ex):    z_values, acts = forward_propagation(weights, biases, num_layers, X)    dweights, dbiases = backward_propagation(weights, biases, num_layers, z_values, acts, y)epsilon = 1e-7    dw_approx = cp.deepcopy(weights)    db_approx = cp.deepcopy(biases)    for layer in range(1, num_layers):        height = weights[layer].shape[0]        width = weights[layer].shape[1]        for i in range(height):            for j in range(width):                w_plus = cp.deepcopy(weights)                w_plus[layer][i, j] += epsilon                w_minus = cp.deepcopy(weights)                w_minus[layer][i, j] -= epsilon                _, temp_plus = forward_propagation(w_plus, biases, num_layers, X)                cost_plus = mse(temp_plus[-1], y, w_plus, decay, num_ex)                _, temp_minus = forward_propagation(w_minus, biases, num_layers, X)                cost_minus = mse(temp_minus[-1], y, w_minus, decay, num_ex)                dw_approx[layer][i, j] = (cost_plus - cost_minus)/(2*epsilon)            b_plus = cp.deepcopy(biases)            b_plus[layer][i, 0] += epsilon            b_minus = cp.deepcopy(biases)            b_minus[layer][i, 0] -= epsilon            _, temp_plus = forward_propagation(weights, b_plus, num_layers, X)            cost_plus = mse(temp_plus[-1], y, weights, decay, num_ex)            _, temp_minus = forward_propagation(weights, b_minus, num_layers, X)            cost_minus = mse(temp_minus[-1], y, weights,  decay, num_ex)            db_approx[layer][i, 0] = (cost_plus - cost_minus)/(2*epsilon)    dweights_flat = [dw.flatten() for dw in dweights[1:]]    dweights_flat = np.concatenate(dweights_flat, axis=None)    dw_approx_flat = [dw.flatten() for dw in dw_approx[1:]]    dw_approx_flat = np.concatenate(dw_approx_flat, axis=None)    dbiases_flat = [db.flatten() for db in dbiases[1:]]    dbiases_flat = np.concatenate(dbiases_flat, axis=None)    db_approx_flat = [db.flatten() for db in db_approx[1:]]    db_approx_flat = np.concatenate(db_approx_flat, axis=None)    d_paras = np.concatenate([dweights_flat, dbiases_flat], axis=None)    d_approx_paras = np.concatenate([dw_approx_flat, db_approx_flat], axis=None)    difference = np.linalg.norm(d_paras - d_approx_paras)/(np.linalg.norm(d_paras) + np.linalg.norm(d_approx_paras))    if difference > 2e-7:        print(        "\033[93m" + "后向传播中存在错误!差异 = " + str(difference) + "\033[0m")else:    print(        "\033[92m" + "您的后向传播运行良好!差异 = " + str(difference) + "\033[0m")return dweights, dbiases, dw_approx, db_approx

编辑:对代码中一些旧的注释进行了更正,以避免混淆

编辑2:感谢@***帮助我找到了代码的主要问题!我还想在这个编辑中提到,我发现我在实现权重衰减的方式上有一些错误。在做出建议的更改并暂时完全移除权重衰减元素后,神经网络似乎运作正常了!


回答:

我运行了您的代码,以下是它的输出:

最终层激活

问题在于您在最后一层也使用了ReLU,因此无法获得最佳拟合,请在最后一层不使用激活函数,这样应该能产生更好的结果。

最终层的激活通常总是与您对隐藏层使用的不同,这取决于您希望获得什么类型的输出。对于连续输出,使用线性激活(基本上是不使用激活函数),对于分类问题,使用sigmoid/softmax。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注