我刚开始学习Matlab和机器学习,尝试编写一个不使用矩阵的梯度下降函数。
- m 是训练集中例子的数量
- n 是每个例子的特征数量
函数gradientDescentMulti接受5个参数:
- X mxn 矩阵
- y m维向量
- theta : n维向量
- alpha : 实数
- nb_iters : 实数
我已经有一个使用矩阵乘法的解决方案
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters) for iter = 1:num_iters gradJ = 1/m * (X'*X*theta - X'*y); theta = theta - alpha * gradJ; endend
迭代后的结果:
theta = 1.0e+05 * 3.3430 1.0009 0.0367
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters) m = length(y); % number of training examples n = size(X, 2); % number of features for iter = 1:num_iters new_theta = zeros(1, n); %// for each feature, found the new theta for t = 1:n S = 0; for example = 1:m h = 0; for example_feature = 1:n h = h + (theta(example_feature) * X(example, example_feature)); end S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example end new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example end %// only at the end of the function, update all theta simultaneously theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector) endend
结果,所有theta值都相同:/
theta = 1.0e+04 * 3.5374 3.5374 3.5374
回答:
如果你查看梯度更新规则,实际上先计算所有训练样本的假设值,然后用每个训练样本的真实值减去这些假设值,并将这些差值存储在一个数组或向量中,可能会更有效。一旦你这样做了,你就可以很容易地计算更新规则。在我看来,你的代码中似乎没有这样做。
因此,我重写了代码,但我有一个单独的数组来存储每个训练样本的假设值与真实值之间的差异。一旦我这样做了,我就可以分别计算每个特征的更新规则:
for iter = 1 : num_iters %// Compute hypothesis differences with ground truth first h = zeros(1, m); for t = 1 : m %// Compute hypothesis for tt = 1 : n h(t) = h(t) + theta(tt)*X(t,tt); end %// Compute difference between hypothesis and ground truth h(t) = h(t) - y(t); end %// Now update parameters new_theta = zeros(1, n); %// for each feature, find the new theta for tt = 1 : n S = 0; %// For each sample, compute products of hypothesis difference %// and the right feature of the sample and accumulate for t = 1 : m S = S + h(t)*X(t,tt); end %// Compute gradient descent step new_theta(tt) = theta(tt) - (alpha/m)*S; end theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector) end
当我这样做时,我得到的结果与使用矩阵公式的结果相同。