以下函数使用梯度下降法寻找回归线的最优“theta”值。输入(X,y)附在下面。我的问题是代码1和代码2有什么不同?为什么代码2能工作而代码1不能工作?
提前感谢!
GRADIENTDESCENTMULTI 执行梯度下降以学习theta,通过学习率alpha进行num_iters次梯度步骤更新theta
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)% 初始化一些有用的值m = length(y); % 训练样本的数量n = length(theta);J_history = zeros(num_iters, 1);costs = zeros(n,1);for iter = 1:num_iters % 代码1 - 不工作 for c = 1:n for i = 1:m costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c); end end % 代码2 - 工作 E = X * theta - y; for c = 1:n costs(c) = sum(E.*X(:,c)); end % 更新每个theta for c = 1:n theta(c) = theta(c) - alpha*costs(c)/m; end J_history(iter) = computeCostMulti(X, y, theta); endendfunction J = computeCostMulti(X, y, theta)for i=1:m J = J+(X(i,:)*theta - y(i))^2;endJ = J/(2*m);
运行代码的方式如下:
alpha = 0.01;num_iters = 200; % 初始化Theta并运行梯度下降 theta = zeros(3, 1);[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);% 绘制收敛图figure;plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);xlabel('迭代次数');ylabel('成本J');% 显示梯度下降的结果fprintf('通过梯度下降计算的Theta: \n');fprintf(' %f \n', theta);fprintf('\n');
X的值为
1.0000 0.1300 -0.22371.0000 -0.5042 -0.22371.0000 0.5025 -0.22371.0000 -0.7357 -1.53781.0000 1.2575 1.09041.0000 -0.0197 1.09041.0000 -0.5872 -0.22371.0000 -0.7219 -0.22371.0000 -0.7810 -0.22371.0000 -0.6376 -0.22371.0000 -0.0764 1.09041.0000 -0.0009 -0.22371.0000 -0.1393 -0.22371.0000 3.1173 2.40451.0000 -0.9220 -0.22371.0000 0.3766 1.09041.0000 -0.8565 -1.53781.0000 -0.9622 -0.22371.0000 0.7655 1.09041.0000 1.2965 1.09041.0000 -0.2940 -0.22371.0000 -0.1418 -1.53781.0000 -0.4992 -0.22371.0000 -0.0487 1.09041.0000 2.3774 -0.22371.0000 -1.1334 -0.22371.0000 -0.6829 -0.22371.0000 0.6610 -0.22371.0000 0.2508 -0.22371.0000 0.8007 -0.22371.0000 -0.2034 -1.53781.0000 -1.2592 -2.85191.0000 0.0495 1.09041.0000 1.4299 -0.22371.0000 -0.2387 1.09041.0000 -0.7093 -0.22371.0000 -0.9584 -0.22371.0000 0.1652 1.09041.0000 2.7864 1.09041.0000 0.2030 1.09041.0000 -0.4237 -1.53781.0000 0.2986 -0.22371.0000 0.7126 1.09041.0000 -1.0075 -0.22371.0000 -1.4454 -1.53781.0000 -0.1871 1.09041.0000 -1.0037 -0.2237
Y的值为
399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999 699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900 464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000 287000 368500 329900 314000 299000 179900 299900 239500
回答:
我想我已经让它正常工作了。主要问题是代码1中你一直在累加cost(c),但在下一次迭代之前从未将其设置为零。你真正需要做的唯一更改是在for c = 1:n
之后和for i = 1:m
之前添加类似cost(c) = 0;
的代码。我确实对你的代码做了一些小的修改才让它对我工作(主要是computeCostMulti
),并且我已经更改了图表以显示两种方法的结果相同。总的来说,以下是包含这些更改的可工作的演示代码片段
close all; clear; clc;%% 数据X = [1.0000 0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000 0.5025 -0.2237; 1.0000 -0.7357 -1.5378; 1.0000 1.2575 1.0904; 1.0000 -0.0197 1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237; 1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764 1.0904; 1.0000 -0.0009 -0.2237; 1.0000 -0.1393 -0.2237; 1.0000 3.1173 2.4045; 1.0000 -0.9220 -0.2237; 1.0000 0.3766 1.0904; 1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000 0.7655 1.0904; 1.0000 1.2965 1.0904; 1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487 1.0904; 1.0000 2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000 0.6610 -0.2237; 1.0000 0.2508 -0.2237; 1.0000 0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519; 1.0000 0.0495 1.0904; 1.0000 1.4299 -0.2237; 1.0000 -0.2387 1.0904; 1.0000 -0.7093 -0.2237; 1.0000 -0.9584 -0.2237; 1.0000 0.1652 1.0904; 1.0000 2.7864 1.0904; 1.0000 0.2030 1.0904; 1.0000 -0.4237 -1.5378; 1.0000 0.2986 -0.2237; 1.0000 0.7126 1.0904; 1.0000 -1.0075 -0.2237; 1.0000 -1.4454 -1.5378; 1.0000 -0.1871 1.0904; 1.0000 -1.0037 -0.2237];y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,... 699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,... 464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,... 287000 368500 329900 314000 299000 179900 299900 239500]';alpha = 0.01;num_iters = 200;% 初始化Theta并运行梯度下降theta0 = zeros(3, 1);[theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);[theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);% 绘制两种方法的收敛图figure;x = 1:numel(J_history_1);subplot(5,1,1:4);plot(x,J_history_1,x,J_history_2);xlim([min(x) max(x)]);set(gca,'XTickLabel','');ylabel('成本J');grid on;subplot(5,1,5);stem(x,(J_history_1-J_history_2)./J_history_1,'ko');xlim([min(x) max(x)]);xlabel('迭代次数');ylabel('frac. \DeltaJ');grid on;% 显示梯度下降的结果fprintf('通过方法1梯度下降计算的Theta: \n');fprintf(' %f \n', theta_result_1);fprintf('通过方法2梯度下降计算的Theta: \n');fprintf(' %f \n', theta_result_2);fprintf('\n');
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)% 初始化一些有用的值m = length(y); % 训练样本的数量n = length(theta);J_history = zeros(num_iters, 1);costs = zeros(n,1);for iter = 1:num_iters if METHOD == 1 % 代码1 - 工作 for c = 1:n costs(c) = 0; for i = 1:m costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c); end end elseif METHOD == 2 % 代码2 - 工作 E = X * theta - y; for c = 1:n costs(c) = sum(E.*X(:,c)); end else error('未知方法'); end % 更新每个theta for c = 1:n theta(c) = theta(c) - alpha*costs(c)/m; end J_history(iter) = computeCostMulti(X, y, theta);endend
function J = computeCostMulti(X, y, theta)m = length(y); J = 0;for mi = 1:m J = J + (X(mi,:)*theta - y(mi))^2;endJ = J/(2*m);end
但再次强调,你真正只需要添加cost(c) = 0;
这一行代码即可。
另外,建议你在脚本开头总是添加close all; clear; clc;
这一行,以确保如果你将它们复制并粘贴到堆栈溢出中,它们能够正常工作。