我尝试实现了梯度下降法,在测试样本数据集上运行正常,但应用到波士顿数据集上时却出现了问题。
您能检查一下代码有什么问题吗?为什么我得不到正确的theta向量?
import numpy as npfrom sklearn.datasets import load_bostonfrom sklearn.model_selection import train_test_splitX = load_boston().datay = load_boston().targetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)X_train1 = np.c_[np.ones((len(X_train), 1)), X_train]X_test1 = np.c_[np.ones((len(X_test), 1)), X_test]eta = 0.0001n_iterations = 100m = len(X_train1)tol = 0.00001theta = np.random.randn(14, 1)for i in range(n_iterations): gradients = 2/m * X_train1.T.dot(X_train1.dot(theta) - y_train) if np.linalg.norm(X_train1) < tol: break theta = theta - (eta * gradients)
我得到的权重向量形状是(14, 354)。我在这儿做错了什么?
回答:
考虑以下代码(为了更好的可见性,我展开了一些语句):
for i in range(n_iterations): y_hat = X_train1.dot(theta) error = y_hat - y_train[:, None] gradients = 2/m * X_train1.T.dot(error) if np.linalg.norm(X_train1) < tol: break theta = theta - (eta * gradients)
由于y_hat的维度是(n_samples, 1),而y_train的维度是(n_samples,) – 在你的例子中n_samples是354 – 你需要使用一个虚拟轴技巧y_train[:, None]
将y_train调整到相同的维度。