使用Python和numpy实现梯度下降的线性回归

我正在尝试使用Python实现Andrew NG的Coursera机器学习课程的第一个练习。课程中这个练习是使用Matlab/Octave完成的,但我希望同样用Python实现它。

问题在于更新theta值的那行代码似乎没有正确工作,它返回的值是[[0.72088159] [0.72088159]],但应该是[[-3.630291] [1.166362]]

我使用了0.01的学习率,并且设置了1500次梯度循环(与Octave中的原始练习相同的值)。

显然,由于theta的值错误,预测结果也不正确,如最后的图表所示。

在测试成本函数的行中,当theta值定义为[0; 0]和[-1; 2]时,结果是正确的(与Octave中的练习相同),所以错误只能在梯度函数中,但我不知道哪里出了问题。

我希望有人能帮助我找出我做错的地方。我已经很感激了。

import numpy as npimport matplotlib.pyplot as plt%matplotlib inlinedef load_data():    X = np.genfromtxt('data.txt', usecols=(0), delimiter=',', dtype=None)        y = np.genfromtxt('data.txt', usecols=(1), delimiter=',', dtype=None)        X = X.reshape(1, X.shape[0])    y = y.reshape(1, y.shape[0])    ones = np.ones(X.shape)    X = np.append(ones, X, axis=0)    theta = np.zeros((2, 1))    return (X, y, theta)alpha = 0.01         iter_num = 1500      debug_at_loop = 10def plot(x, y, y_hat=None):    x = x.reshape(x.shape[0], 1)    plt.xlabel('x')    plt.ylabel('hΘ(x)')    plt.ylim(ymax = 25, ymin = -5)    plt.xlim(xmax = 25, xmin = 5)    plt.scatter(x, y)    if type(y_hat) is np.ndarray:        plt.plot(x, y_hat, '-')    plt.show()plot(X[1], y)def hip(X, theta):    return np.dot(theta.T, X)def cost(X, y, theta):    m = y.shape[1]    return np.sum(np.square(hip(X, theta) - y)) / (2 * m)print('With theta = [0 ; 0]')print('Cost computed =', cost(X, y, np.array([0, 0])))print()print('With theta = [-1 ; 2]')print('Cost computed =', cost(X, y, np.array([-1, 2])))def grad(X, y, alpha, theta, iter_num=1500, debug_cost_at_each=10):    J = []    m = y.shape[1]    for i in range(iter_num):        theta -= ((alpha * 1) / m) * np.sum(np.dot(hip(X, theta) - y, X.T))        if i % debug_cost_at_each == 0:            J.append(round(cost(X, y, theta), 6))    return J, thetaX, y, theta = load_data()J, fit_theta = grad(X, y, alpha, theta)print('Theta found by Gradient Descent:', fit_theta)# Predict values for population sizes of 35,000 and 70,000predict1 = np.dot(np.array([[1], [3.5]]).T, fit_theta);print('For population = 35,000, we predict a profit of \n', predict1 * 10000);predict2 = np.dot(np.array([[1], [7]]).T, fit_theta);print('For population = 70,000, we predict a profit of \n', predict2 * 10000);pred_y = hip(X, fit_theta)plot(X[1], y, pred_y.T)

我使用的数据是以下文本文件:

6.1101,17.5925.5277,9.13028.5186,13.6627.0032,11.8545.8598,6.82338.3829,11.8867.4764,4.34838.5781,126.4862,6.59875.0546,3.81665.7107,3.252214.164,15.5055.734,3.15518.4084,7.22585.6407,0.716185.3794,3.51296.3654,5.30485.1301,0.560776.4296,3.65187.0708,5.38936.1891,3.138620.27,21.7675.4901,4.2636.3261,5.18755.5649,3.082518.945,22.63812.828,13.50110.957,7.046713.176,14.69222.203,24.1475.2524,-1.226.5894,5.99669.2482,12.1345.8918,1.84958.2111,6.54267.9334,4.56238.0959,4.11645.6063,3.392812.836,10.1176.3534,5.49745.4069,0.556576.8825,3.911511.708,5.38545.7737,2.44067.8247,6.73187.0931,1.04635.0702,5.13375.8014,1.84411.7,8.00435.5416,1.01797.5402,6.75045.3077,1.83967.4239,4.28857.6031,4.99816.3328,1.42336.3589,-1.42116.2742,2.47565.6397,4.60429.3102,3.96249.4536,5.41418.8254,5.16945.1793,-0.7427921.279,17.92914.908,12.05418.959,17.0547.2182,4.88528.2951,5.744210.236,7.77545.4994,1.017320.341,20.99210.136,6.67997.3345,4.02596.0062,1.27847.2259,3.34115.0269,-2.68076.5479,0.296787.5386,3.88455.0365,5.701410.274,6.75265.1077,2.05765.7292,0.479535.1884,0.204216.3557,0.678619.7687,7.54356.5159,5.34368.5172,4.24159.1802,6.79816.002,0.926955.5204,0.1525.0594,2.82145.7077,1.84517.6366,4.29595.8707,7.20295.3054,1.98698.2934,0.1445413.394,9.05515.4369,0.61705

回答:

好吧,我在掉了好几根头发后终于解决了这个问题(编程可能会让我变秃)。

问题出在梯度线上,解决方案是这样的:

theta -= ((alpha * 1) / m) * np.dot(X, (hip(X, theta) - y).T)

我改变了X的位置,并对误差向量进行了转置。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注