多变量梯度下降 Matlab – 两段代码有何不同?

以下函数使用梯度下降法寻找回归线的最优“theta”值。输入(X,y)附在下面。我的问题是代码1和代码2有什么不同?为什么代码2能工作而代码1不能工作?

提前感谢!

GRADIENTDESCENTMULTI 执行梯度下降以学习theta,通过学习率alpha进行num_iters次梯度步骤更新theta

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)% 初始化一些有用的值m = length(y); % 训练样本的数量n = length(theta);J_history = zeros(num_iters, 1);costs = zeros(n,1);for iter = 1:num_iters    % 代码1 - 不工作     for c = 1:n        for i = 1:m            costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c);        end      end    % 代码2 - 工作    E = X * theta - y;    for c = 1:n        costs(c) = sum(E.*X(:,c));    end    % 更新每个theta    for c = 1:n        theta(c) = theta(c) - alpha*costs(c)/m;    end    J_history(iter) = computeCostMulti(X, y, theta);    endendfunction J = computeCostMulti(X, y, theta)for i=1:m    J = J+(X(i,:)*theta - y(i))^2;endJ = J/(2*m);

运行代码的方式如下:

alpha = 0.01;num_iters = 200; % 初始化Theta并运行梯度下降 theta = zeros(3, 1);[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);% 绘制收敛图figure;plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);xlabel('迭代次数');ylabel('成本J');% 显示梯度下降的结果fprintf('通过梯度下降计算的Theta: \n');fprintf(' %f \n', theta);fprintf('\n');

X的值为

1.0000    0.1300   -0.22371.0000   -0.5042   -0.22371.0000    0.5025   -0.22371.0000   -0.7357   -1.53781.0000    1.2575    1.09041.0000   -0.0197    1.09041.0000   -0.5872   -0.22371.0000   -0.7219   -0.22371.0000   -0.7810   -0.22371.0000   -0.6376   -0.22371.0000   -0.0764    1.09041.0000   -0.0009   -0.22371.0000   -0.1393   -0.22371.0000    3.1173    2.40451.0000   -0.9220   -0.22371.0000    0.3766    1.09041.0000   -0.8565   -1.53781.0000   -0.9622   -0.22371.0000    0.7655    1.09041.0000    1.2965    1.09041.0000   -0.2940   -0.22371.0000   -0.1418   -1.53781.0000   -0.4992   -0.22371.0000   -0.0487    1.09041.0000    2.3774   -0.22371.0000   -1.1334   -0.22371.0000   -0.6829   -0.22371.0000    0.6610   -0.22371.0000    0.2508   -0.22371.0000    0.8007   -0.22371.0000   -0.2034   -1.53781.0000   -1.2592   -2.85191.0000    0.0495    1.09041.0000    1.4299   -0.22371.0000   -0.2387    1.09041.0000   -0.7093   -0.22371.0000   -0.9584   -0.22371.0000    0.1652    1.09041.0000    2.7864    1.09041.0000    0.2030    1.09041.0000   -0.4237   -1.53781.0000    0.2986   -0.22371.0000    0.7126    1.09041.0000   -1.0075   -0.22371.0000   -1.4454   -1.53781.0000   -0.1871    1.09041.0000   -1.0037   -0.2237

Y的值为

  399900  329900  369000  232000  539900  299900  314900  198999  212000  242500  239999  347000  329999  699900  259900  449900  299900  199900  499998  599000  252900  255000  242900  259900  573900  249900  464500  469000  475000  299900  349900  169900  314900  579900  285900  249900  229900  345000  549000  287000  368500  329900  314000  299000  179900  299900  239500

回答:

我想我已经让它正常工作了。主要问题是代码1中你一直在累加cost(c),但在下一次迭代之前从未将其设置为零。你真正需要做的唯一更改是在for c = 1:n之后和for i = 1:m之前添加类似cost(c) = 0;的代码。我确实对你的代码做了一些小的修改才让它对我工作(主要是computeCostMulti),并且我已经更改了图表以显示两种方法的结果相同。总的来说,以下是包含这些更改的可工作的演示代码片段

close all; clear; clc;%% 数据X = [1.0000  0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000  0.5025 -0.2237; 1.0000 -0.7357 -1.5378;    1.0000  1.2575  1.0904; 1.0000 -0.0197  1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237;    1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764  1.0904; 1.0000 -0.0009 -0.2237;    1.0000 -0.1393 -0.2237; 1.0000  3.1173  2.4045; 1.0000 -0.9220 -0.2237; 1.0000  0.3766  1.0904;    1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000  0.7655  1.0904; 1.0000  1.2965  1.0904;    1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487  1.0904;    1.0000  2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000  0.6610 -0.2237;    1.0000  0.2508 -0.2237; 1.0000  0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519;    1.0000  0.0495  1.0904; 1.0000  1.4299 -0.2237; 1.0000 -0.2387  1.0904; 1.0000 -0.7093 -0.2237;    1.0000 -0.9584 -0.2237; 1.0000  0.1652  1.0904; 1.0000  2.7864  1.0904; 1.0000  0.2030  1.0904;    1.0000 -0.4237 -1.5378; 1.0000  0.2986 -0.2237; 1.0000  0.7126  1.0904; 1.0000 -1.0075 -0.2237;    1.0000 -1.4454 -1.5378; 1.0000 -0.1871  1.0904; 1.0000 -1.0037 -0.2237];y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,...    699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,...    464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,...    287000 368500 329900 314000 299000 179900 299900 239500]';alpha = 0.01;num_iters = 200;% 初始化Theta并运行梯度下降theta0 = zeros(3, 1);[theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);[theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);% 绘制两种方法的收敛图figure;x = 1:numel(J_history_1);subplot(5,1,1:4);plot(x,J_history_1,x,J_history_2);xlim([min(x) max(x)]);set(gca,'XTickLabel','');ylabel('成本J');grid on;subplot(5,1,5);stem(x,(J_history_1-J_history_2)./J_history_1,'ko');xlim([min(x) max(x)]);xlabel('迭代次数');ylabel('frac. \DeltaJ');grid on;% 显示梯度下降的结果fprintf('通过方法1梯度下降计算的Theta: \n');fprintf(' %f \n', theta_result_1);fprintf('通过方法2梯度下降计算的Theta: \n');fprintf(' %f \n', theta_result_2);fprintf('\n');

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)% 初始化一些有用的值m = length(y); % 训练样本的数量n = length(theta);J_history = zeros(num_iters, 1);costs = zeros(n,1);for iter = 1:num_iters    if METHOD == 1 % 代码1 - 工作        for c = 1:n            costs(c) = 0;            for i = 1:m                costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c);            end        end    elseif METHOD == 2 % 代码2 - 工作        E = X * theta - y;        for c = 1:n            costs(c) = sum(E.*X(:,c));        end    else        error('未知方法');    end    % 更新每个theta    for c = 1:n        theta(c) = theta(c) - alpha*costs(c)/m;    end    J_history(iter) = computeCostMulti(X, y, theta);endend

function J = computeCostMulti(X, y, theta)m = length(y); J = 0;for mi = 1:m    J = J + (X(mi,:)*theta - y(mi))^2;endJ = J/(2*m);end

但再次强调,你真正只需要添加cost(c) = 0;这一行代码即可。

另外,建议你在脚本开头总是添加close all; clear; clc;这一行,以确保如果你将它们复制并粘贴到堆栈溢出中,它们能够正常工作。

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注