我正在尝试实现梯度下降算法,以最小化多元线性算法的成本函数。我使用了Andrew Ng机器学习课程中解释的概念。我使用的是Octave。然而,当我尝试执行代码时,似乎无法提供解决方案,因为我的theta值计算为“NaN”。我附上了成本函数代码和梯度下降代码。请问有人可以帮忙吗?
成本函数:
function J = computeCostMulti(X, y, theta)m = length(y); % number of training examplesJ = 0;h=(X*theta);s= sum((h-y).^2);J= s/(2*m);
梯度下降代码:
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)m = length(y); % number of training examplesJ_history = zeros(num_iters, 1);for iter = 1:num_iters a= X*theta -y; b = alpha*(X'*a); theta = theta - (b/m); J_history(iter) = computeCostMulti(X, y, theta); end
回答:
我确实找到了错误,并不是成本函数或梯度下降函数的逻辑问题。而是特征归一化逻辑中的问题,我不小心返回了错误的变量,因此导致输出为“NaN”。
这是一个愚蠢的错误:
我之前在做什么
mu= mean(a);sigma = std(a);b=(X.-mu);X= b./sigma;
而我应该做的是
function [X_norm, mu, sigma] = featureNormalize(X)%FEATURENORMALIZE Normalizes the features in X % FEATURENORMALIZE(X) returns a normalized version of X where% the mean value of each feature is 0 and the standard deviation% is 1. This is often a good preprocessing step to do when% working with learning algorithms.% You need to set these values correctlyX_norm = X;mu = zeros(1, size(X, 2));sigma = zeros(1, size(X, 2));% ====================== YOUR CODE HERE ======================mu= mean(X);sigma = std(X);a=(X.-mu);X_norm= a./sigma;% ============================================================end
所以很明显,我应该使用X_norm而不是X,这就是导致代码输出错误的原因