### SciPy优化在神经网络中的警告

我在使用SciPy中的fmin_bfgs()优化函数进行NeuralNetwork优化时,收到了以下警告。一切应该清晰且简单,遵循Backpropagation算法。

1 前向传播训练示例。
2 计算每个单元的误差项。
3 累积梯度(对于第一个示例,我跳过了正则化项)。

Starting Loss: 7.26524579601Check gradient: 2.02493576268Warning: Desired error not necessarily achieved due to precision loss.         Current function value: 5.741300         Iterations: 3         Function evaluations: 104         Gradient evaluations: 92Trained Loss: 5.74130012926

我在MATLAB中完成了相同的任务,使用fmin函数进行优化,执行成功,但我不明白在Python实现中遗漏了什么。正如你所见,即使scipy.optimize.check_grad返回的值也过大。

def feed_forward(x, theta1, theta2):    hidden_dot = np.dot(add_bias(x), np.transpose(theta1))    hidden_p = sigmoid(hidden_dot)    p = sigmoid(np.dot(add_bias(hidden_p), np.transpose(theta2)))    return hidden_dot, hidden_p, pdef cost(thetas, x, y, hidden, lam):    theta1, theta2 = get_theta_from(thetas, x, y, hidden)    _, _, p = feed_forward(x, theta1, theta2)    # regularization = (lam / (len(x) * 2)) * (    #     np.sum(np.square(np.delete(theta1, 0, 1)))    #     + np.sum(np.square(np.delete(theta2, 0, 1))))    complete = -1 * np.dot(np.transpose(y), np.log(p)) \               - np.dot(np.transpose(1 - y), np.log(1 - p))    return np.sum(complete) / len(x)  # + regularizationdef vector(z):    # noinspection PyUnresolvedReferences    return np.reshape(z, (np.shape(z)[0], 1))def gradient(thetas, x, y, hidden, lam):    theta1, theta2 = get_theta_from(thetas, x, y, hidden)    hidden_dot, hidden_p, p = feed_forward(x, theta1, theta2)    error_o = p - y    error_h = np.multiply(np.dot(        error_o, np.delete(theta2, 0, 1)), sigmoid_gradient(hidden_dot))    x = add_bias(x)    hidden_p = add_bias(hidden_p)    theta1_grad, theta2_grad = \        np.zeros(theta1.shape[::-1]), np.zeros(theta2.shape[::-1])    records = y.shape[0]    for i in range(records):        theta1_grad = theta1_grad + np.dot(            vector(x[i]), np.transpose(vector(error_h[i])))        theta2_grad = theta2_grad + np.dot(            vector(hidden_p[i]), np.transpose(vector(error_o[i])))    theta1_grad = np.transpose(        theta1_grad / records)  # + (lam / records * theta1)    theta2_grad = np.transpose(        theta2_grad / records)  # + (lam / records * theta2)    return np.append(theta1_grad, theta2_grad)def get_theta_shapes(x, y, hidden):    return (hidden, x.shape[1] + 1), \           (y.shape[1], hidden + 1)def get_theta_from(thetas, x, y, hidden):    t1_s, t2_s = get_theta_shapes(x, y, hidden)    split = t1_s[0] * t1_s[1]    theta1 = np.reshape(thetas[:split], t1_s)    theta2 = np.reshape(thetas[split:], t2_s)    return theta1, theta2def train(x, y, hidden_size, lam):    y = get_binary_y(y)    t1_s, t2_s = get_theta_shapes(x, y, hidden_size)    thetas = np.append(        rand_init(t1_s[0], t1_s[1]),        rand_init(t2_s[0], t2_s[1]))    initial_cost = cost(thetas, x, y, hidden_size, lam)    print("Starting Loss: " + str(initial_cost))    check_grad1 = scipy.optimize.check_grad(        cost, gradient, thetas, x, y, hidden_size, lam)    print("Check gradient: " + str(check_grad1))    trained_theta = scipy.optimize.fmin_bfgs(        cost, thetas, fprime=gradient, args=(x, y, hidden_size, lam))    print("Trained Loss: " +          str(cost(trained_theta, x, y, hidden_size, lam)))

回答:

再次强调,计算中存在多个问题,为了解决所有警告并使Scipy优化运行成功,与Matlab中的fminc优化函数一致。(可以在Github上找到工作的Python示例)

1.更新成本计算为正确的版本。在成本函数中进行元素-wise乘法。成本的正确解决方案将是(包括正则化项):

def cost(thetas, x, y, hidden, lam):    theta1, theta2 = get_theta_from(thetas, x, y, hidden)    _, _, p = feed_forward(x, theta1, theta2)    regularization = (lam / (len(x) * 2)) * (        np.sum(np.square(np.delete(theta1, 0, 1)))        + np.sum(np.square(np.delete(theta2, 0, 1))))    complete = np.nan_to_num(np.multiply((-y), np.log(        p)) - np.multiply((1 - y), np.log(1 - p)))    avg = np.sum(complete) / len(x)    return avg + regularization

2.执行此操作后,我们将在Scipy优化的Theta项中接收到nan值。对于这种情况,我们在上面执行了np.nan_to_num注意!Matlab中的fminc可以正确处理意外数字。

3.应用正确的正则化,不要忘记移除偏置值的正则化。正确的梯度函数应如下所示:

def gradient(thetas, x, y, hidden, lam):    theta1, theta2 = get_theta_from(thetas, x, y, hidden)    hidden_dot, hidden_p, p = feed_forward(x, theta1, theta2)    error_o = p - y    error_h = np.multiply(np.dot(        error_o, theta2),        sigmoid_gradient(add_bias(hidden_dot)))    x = add_bias(x)    error_h = np.delete(error_h, 0, 1)    theta1_grad, theta2_grad = \        np.zeros(theta1.shape[::-1]), np.zeros(theta2.shape[::-1])    records = y.shape[0]    for i in range(records):        theta1_grad = theta1_grad + np.dot(            vector(x[i]), np.transpose(vector(error_h[i])))        theta2_grad = theta2_grad + np.dot(            vector(hidden_p[i]), np.transpose(vector(error_o[i])))    reg_theta1 = theta1.copy()    reg_theta1[:, 0] = 0    theta1_grad = np.transpose(        theta1_grad / records) + ((lam / records) * reg_theta1)    reg_theta2 = theta2.copy()    reg_theta2[:, 0] = 0    theta2_grad = np.transpose(        theta2_grad / records) + ((lam / records) * reg_theta2)    return np.append(        theta1_grad, theta2_grad)

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注