当我使用这段代码进行单变量线性回归时,theta 的计算是正确的,但在多变量情况下,theta 的输出却非常奇怪。
我正在尝试将我在学习 Andrew Ng 课程时编写的 Octave 代码转换过来。
这是主调用文件:
m = data.shape[0]a = np.array(data[0])a.shape = (m,1)b = np.array(data[1])b.shape = (m, 1)x = np.append(a, b, axis=1)y = np.array(data[2])lr = LR.LinearRegression()[X, mu, sigma] = lr.featureNormalize(x)z = np.ones((m, 1), dtype=float)X = np.append(z, X, axis=1)alpha = 0.01num_iters = 400theta = np.zeros(shape=(3,1))[theta, J_history] = lr.gradientDescent(X, y, theta, alpha, num_iters)print(theta)
以下是类的内容:
class LinearRegression: def featureNormalize(self, data):#this normalizes the features data = np.array(data) x_norm = data mu = np.zeros(shape=(1, data.shape[1]))#creates mu vector filled with zeros sigma = np.zeros(shape=(1, data.shape[1])) for i in range(0, data.shape[1]): mu[0, i] = np.mean(data[:, i]) sigma[0, i] = np.std(data[:, i]) for i in range(0, data.shape[1]): x_norm[:, i] = np.subtract(x_norm[:, i], mu[0, i]) x_norm[:, i] = np.divide(x_norm[:, i], sigma[0, i]) return [x_norm, mu, sigma] def gradientDescent(self, X, y, theta, alpha, num_iters): m = y.shape[0] J_history = np.zeros(shape=(num_iters, 1)) for i in range(0, num_iters): predictions = X.dot(theta) # X is 47*3 theta is 3*1 predictions is 47*1 theta = np.subtract(theta , (alpha / m) * np.transpose((np.transpose(np.subtract(predictions ,y))).dot(X))) #1*97 into 97*3 J_history[i] = self.computeCost(X, y, theta) return [theta, J_history] def computeCost(self, X, y, theta): warnings.filterwarnings('ignore') m = X.shape[0] J = 0 predictions = X.dot(theta) sqrErrors = np.power(predictions - y, 2) J = 1 / (2 * m) * np.sum(sqrErrors) return J
我期望得到一个 3*1 的 theta 矩阵。根据 Andrew 的课程,我的 Octave 实现产生了 theta
334302.063993 100087.116006 3673.548451
但是在 Python 实现中,我得到了非常奇怪的输出:
[[384596.12996714 317274.97693463 354878.64955708 223121.53576488 519238.43603216 288423.05420641 302849.01557052 191383.45903309 203886.92061274 233219.70871976 230814.42009498 333720.57288972 317370.18827964 673115.35724932 249953.82390212 432682.6678475 288423.05420641 192249.97844569 480863.45534211 576076.72380674 243221.70859887 245241.34318985 233604.4010228 249953.82390212 551937.2817908 240336.51632605 446723.93690857 451051.7253178 456822.10986344 288423.05420641 336509.59208678 163398.05571747 302849.01557052 557707.6...................... this goes on for long
同样的代码在单变量数据集上运行得非常好。在 Octave 中也运行良好,但似乎我在多变量情况下已经困惑了两个多小时。很高兴能得到你的帮助。
回答:
在 gradientDescent 函数的 for 循环的第二行尝试以下代码:
theta=theta-(alpha/m)*X.T.dot(X.dot(theta)-y)
此外,如果你想添加一列全为1的列,可以这样做更简单:
np.c_[np.ones((m,1)),data]