我在 Coursera 上学习 Andrew NG 的课程,想在 Python 中实现相同的逻辑。我正在尝试使用
scipy.optimize.fmin_ncg
来计算成本和 theta。以下是代码:
import numpy as npfrom scipy.optimize import fmin_ncgdef sigmoid(z): return (1 / (1 + np.exp(-z))).reshape(-1, 1)def compute_cost(theta, X, y): m = len(y) hypothesis = sigmoid(np.dot(X, theta)) cost = (1 / m) * np.sum(np.dot(-y.T, (np.log(hypothesis))) - np.dot((1 - y.T), np.log(1 - hypothesis))) return costdef compute_gradient(theta, X, y): m = len(y) hypothesis = sigmoid(np.dot(X, theta)) gradient = (1 / m) * np.dot(X.T, (hypothesis - y)) return gradientdef main(): data = np.loadtxt("data/data1.txt", delimiter=",") # 100, 3 X = data[:, 0:2] y = data[:, 2:] m, n = X.shape initial_theta = np.zeros((n + 1, 1)) X = np.column_stack((np.ones(m), X)) mr = fmin_ncg(compute_cost, initial_theta, compute_gradient, args=(X, y), full_output=True) print(mr)if __name__ == "__main__": main()
当我尝试运行这段代码时,遇到了以下错误/异常:
Traceback (most recent call last): File "/file/path/without_regression.py", line 78, in <module> main() File "/file/path/without_regression.py", line 66, in main mr = fmin_ncg(compute_cost, initial_theta, compute_gradient, args=(X, y), full_output=True) File "/usr/local/anaconda3/envs/ml/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 1400, in fmin_ncg callback=callback, **opts) File "/usr/local/anaconda3/envs/ml/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 1497, in _minimize_newtoncg dri0 = numpy.dot(ri, ri)ValueError: shapes (3,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)
我不明白这个错误。作为初学者,这对我来说可能不太清楚。
如何使用 scipy.optimize.fmin_ncg
或其他最小化技术,如 scipy.optimize.minimize(...)
来计算成本和 theta?
回答:
正如评论中提到的:
虽然我现在没有查阅文档的参考,但你应该始终使用一维数组。
import numpy as npa = np.random.random(size=(3,1)) # 不建议使用!a.shape # (3, 1)a.ndim # 2b = np.random.random(size=3) # 建议使用!b.shape # (3,) b.ndim # 1
这适用于你的 x0
(如果不使用 Python 列表)和你的梯度。
一个快速的修补方法(=减少梯度维度)如下:
gradient = (1 / m) * np.dot(X.T, (hypothesis - y)).ravel() # .ravel()!... initial_theta = np.zeros(n + 1) # 去掉额外维度
使得代码可以运行:
Optimization terminated successfully. Current function value: 0.203498 Iterations: 27 Function evaluations: 71 Gradient evaluations: 229 Hessian evaluations: 0(array([-25.13045417, 0.20598475, 0.2012217 ]), 0.2034978435366513, 71, 229, 0, 0)
额外说明:在调试过程中,我还通过数值微分(推荐!)检查了梯度计算本身,这在使用 x0 时看起来不错:
from scipy.optimize import check_grad as cgprint(cg(compute_cost, compute_gradient, initial_theta, X, y))# 1.24034933954e-05