多变量梯度下降 Matlab – 两段代码有何不同？

以下函数使用梯度下降法寻找回归线的最优“theta”值。输入（X,y）附在下面。我的问题是代码1和代码2有什么不同？为什么代码2能工作而代码1不能工作？

提前感谢！

GRADIENTDESCENTMULTI 执行梯度下降以学习theta，通过学习率alpha进行num_iters次梯度步骤更新theta

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)% 初始化一些有用的值m = length(y); % 训练样本的数量n = length(theta);J_history = zeros(num_iters, 1);costs = zeros(n,1);for iter = 1:num_iters    % 代码1 - 不工作     for c = 1:n        for i = 1:m            costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c);        end      end    % 代码2 - 工作    E = X * theta - y;    for c = 1:n        costs(c) = sum(E.*X(:,c));    end    % 更新每个theta    for c = 1:n        theta(c) = theta(c) - alpha*costs(c)/m;    end    J_history(iter) = computeCostMulti(X, y, theta);    endendfunction J = computeCostMulti(X, y, theta)for i=1:m    J = J+(X(i,:)*theta - y(i))^2;endJ = J/(2*m);

运行代码的方式如下：

alpha = 0.01;num_iters = 200; % 初始化Theta并运行梯度下降 theta = zeros(3, 1);[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);% 绘制收敛图figure;plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);xlabel('迭代次数');ylabel('成本J');% 显示梯度下降的结果fprintf('通过梯度下降计算的Theta: \n');fprintf(' %f \n', theta);fprintf('\n');

X的值为

1.0000    0.1300   -0.22371.0000   -0.5042   -0.22371.0000    0.5025   -0.22371.0000   -0.7357   -1.53781.0000    1.2575    1.09041.0000   -0.0197    1.09041.0000   -0.5872   -0.22371.0000   -0.7219   -0.22371.0000   -0.7810   -0.22371.0000   -0.6376   -0.22371.0000   -0.0764    1.09041.0000   -0.0009   -0.22371.0000   -0.1393   -0.22371.0000    3.1173    2.40451.0000   -0.9220   -0.22371.0000    0.3766    1.09041.0000   -0.8565   -1.53781.0000   -0.9622   -0.22371.0000    0.7655    1.09041.0000    1.2965    1.09041.0000   -0.2940   -0.22371.0000   -0.1418   -1.53781.0000   -0.4992   -0.22371.0000   -0.0487    1.09041.0000    2.3774   -0.22371.0000   -1.1334   -0.22371.0000   -0.6829   -0.22371.0000    0.6610   -0.22371.0000    0.2508   -0.22371.0000    0.8007   -0.22371.0000   -0.2034   -1.53781.0000   -1.2592   -2.85191.0000    0.0495    1.09041.0000    1.4299   -0.22371.0000   -0.2387    1.09041.0000   -0.7093   -0.22371.0000   -0.9584   -0.22371.0000    0.1652    1.09041.0000    2.7864    1.09041.0000    0.2030    1.09041.0000   -0.4237   -1.53781.0000    0.2986   -0.22371.0000    0.7126    1.09041.0000   -1.0075   -0.22371.0000   -1.4454   -1.53781.0000   -0.1871    1.09041.0000   -1.0037   -0.2237

Y的值为

  399900  329900  369000  232000  539900  299900  314900  198999  212000  242500  239999  347000  329999  699900  259900  449900  299900  199900  499998  599000  252900  255000  242900  259900  573900  249900  464500  469000  475000  299900  349900  169900  314900  579900  285900  249900  229900  345000  549000  287000  368500  329900  314000  299000  179900  299900  239500

回答：

我想我已经让它正常工作了。主要问题是代码1中你一直在累加cost(c)，但在下一次迭代之前从未将其设置为零。你真正需要做的唯一更改是在for c = 1:n之后和for i = 1:m之前添加类似cost(c) = 0;的代码。我确实对你的代码做了一些小的修改才让它对我工作（主要是computeCostMulti），并且我已经更改了图表以显示两种方法的结果相同。总的来说，以下是包含这些更改的可工作的演示代码片段

close all; clear; clc;%% 数据X = [1.0000  0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000  0.5025 -0.2237; 1.0000 -0.7357 -1.5378;    1.0000  1.2575  1.0904; 1.0000 -0.0197  1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237;    1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764  1.0904; 1.0000 -0.0009 -0.2237;    1.0000 -0.1393 -0.2237; 1.0000  3.1173  2.4045; 1.0000 -0.9220 -0.2237; 1.0000  0.3766  1.0904;    1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000  0.7655  1.0904; 1.0000  1.2965  1.0904;    1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487  1.0904;    1.0000  2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000  0.6610 -0.2237;    1.0000  0.2508 -0.2237; 1.0000  0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519;    1.0000  0.0495  1.0904; 1.0000  1.4299 -0.2237; 1.0000 -0.2387  1.0904; 1.0000 -0.7093 -0.2237;    1.0000 -0.9584 -0.2237; 1.0000  0.1652  1.0904; 1.0000  2.7864  1.0904; 1.0000  0.2030  1.0904;    1.0000 -0.4237 -1.5378; 1.0000  0.2986 -0.2237; 1.0000  0.7126  1.0904; 1.0000 -1.0075 -0.2237;    1.0000 -1.4454 -1.5378; 1.0000 -0.1871  1.0904; 1.0000 -1.0037 -0.2237];y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,...    699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,...    464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,...    287000 368500 329900 314000 299000 179900 299900 239500]';alpha = 0.01;num_iters = 200;% 初始化Theta并运行梯度下降theta0 = zeros(3, 1);[theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);[theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);% 绘制两种方法的收敛图figure;x = 1:numel(J_history_1);subplot(5,1,1:4);plot(x,J_history_1,x,J_history_2);xlim([min(x) max(x)]);set(gca,'XTickLabel','');ylabel('成本J');grid on;subplot(5,1,5);stem(x,(J_history_1-J_history_2)./J_history_1,'ko');xlim([min(x) max(x)]);xlabel('迭代次数');ylabel('frac. \DeltaJ');grid on;% 显示梯度下降的结果fprintf('通过方法1梯度下降计算的Theta: \n');fprintf(' %f \n', theta_result_1);fprintf('通过方法2梯度下降计算的Theta: \n');fprintf(' %f \n', theta_result_2);fprintf('\n');

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)% 初始化一些有用的值m = length(y); % 训练样本的数量n = length(theta);J_history = zeros(num_iters, 1);costs = zeros(n,1);for iter = 1:num_iters    if METHOD == 1 % 代码1 - 工作        for c = 1:n            costs(c) = 0;            for i = 1:m                costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c);            end        end    elseif METHOD == 2 % 代码2 - 工作        E = X * theta - y;        for c = 1:n            costs(c) = sum(E.*X(:,c));        end    else        error('未知方法');    end    % 更新每个theta    for c = 1:n        theta(c) = theta(c) - alpha*costs(c)/m;    end    J_history(iter) = computeCostMulti(X, y, theta);endend

function J = computeCostMulti(X, y, theta)m = length(y); J = 0;for mi = 1:m    J = J + (X(mi,:)*theta - y(mi))^2;endJ = J/(2*m);end

但再次强调，你真正只需要添加cost(c) = 0;这一行代码即可。

另外，建议你在脚本开头总是添加close all; clear; clc;这一行，以确保如果你将它们复制并粘贴到堆栈溢出中，它们能够正常工作。

学技术

多变量梯度下降 Matlab – 两段代码有何不同？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复