以下是将给定内容翻译为中文后的结果，保持了原有的HTML标签和格式，并对语句进行了适当的改写以使其更加通顺和自然：迭代实现梯度下降算法中的错误是什么？

我尝试实现了梯度下降算法的迭代版本，但它无法正常工作。然而，同样的算法的向量化实现却能正确工作。
以下是迭代实现的代码：

function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)    % get the number of rows and columns    nrows = size(X, 1);    ncols = size(X, 2);    % initialize the hypothesis vector    h = zeros(nrows, 1);    % initialize the temporary theta vector    theta_temp = zeros(ncols, 1);    % run gradient descent for the specified number of iterations    count = 1;    while count <= iterations        % calculate the hypothesis values and fill into the vector        for i = 1 : nrows            for j = 1 : ncols                term = theta(j) * X(i, j);                h(i) = h(i) + term;            end        end        % calculate the gradient        for j = 1 : ncols            for i = 1 : nrows                term = (h(i) - y(i)) * X(i, j);                theta_temp(j) = theta_temp(j) + term;            end        end        % update the gradient with the factor        fact = alpha / nrows;        for i = 1 : ncols            theta_temp(i) = fact * theta_temp(i);        end        % update the theta        for i = 1 : ncols            theta(i) = theta(i) - theta_temp(i);        end        % update the count        count += 1;    endend

以下是相同算法的向量化实现：

function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)    % set the learning rate    learn_rate = alpha;    % set the number of iterations    n = 1500;    % number of training examples    m = length(y);    % initialize the theta_new vector    l = length(theta);    theta_new = zeros(l,1);    % initialize the cost vector    J_cost = zeros(n,1);    % initialize the vector to store all the calculated theta values    theta_all = zeros(n,2);    % perform gradient descent for the specified number of iterations    for i = 1 : n        % calculate the hypothesis        hypothesis = X * theta;        % calculate the error        err = hypothesis - y;        % calculate the gradient        grad = X' * err;        % calculate the new theta        theta_new = (learn_rate/m) .* grad;        % update the old theta        theta = theta - theta_new;        % update the cost        J_cost(i) = computeCost(X, y, theta);        % store the calculated theta value        if i < n            index = i + 1;            theta_all(index,:) = theta';    endend

数据集的链接可以在这里找到

文件名为ex1data1.txt

问题

对于初始theta = [0, 0]（这是一个向量！），学习率为0.01，运行1500次迭代，我得到的最佳theta值为：

theta0 = -3.6303
theta1 = 1.1664

以上是向量化实现的输出，我知道我已经正确实现了（它通过了Coursera上的所有测试用例）。

然而，当我使用迭代方法实现相同的算法时（我提到的第一个代码），我得到的theta值为（alpha = 0.01，iterations = 1500）：

theta0 = -0.20720
theta1 = -0.77392

这个实现未能通过测试用例，因此我知道实现是错误的。

然而，我无法理解我哪里出错了，因为迭代代码执行了与向量化代码相同的操作，同样的乘法运算，当我尝试追踪两段代码一个迭代的输出时，值在纸上计算是相同的，但在Octave上运行时却失败了。

任何关于此问题的帮助都将非常有帮助，特别是如果你能指出我犯了什么错误以及失败的确切原因。

需要考虑的要点

假设的实现是正确的，因为我测试过，两段代码的输出结果相同，所以这里没有问题。
我打印了两个代码中的梯度向量的输出，并意识到错误出在这里，因为这里的输出非常不同！

此外，以下是预处理数据的代码：

function[X, y] = fileReader(filename)    % load the dataset    dataset = load(filename);    % get the dimensions of the dataset    nrows = size(dataset, 1);    ncols = size(dataset, 2);    % generate the X matrix from the dataset    X = dataset(:, 1 : ncols - 1);    % generate the y vector    y = dataset(:, ncols);    % append 1's to the X matrix    X = [ones(nrows, 1), X];end

回答：

第一个代码的问题在于theta_temp和h向量没有被正确初始化。对于第一次迭代（当count值等于1时），你的代码运行正常，因为在那个特定迭代中，h和theta_temp向量已经被正确初始化为0。然而，由于这些是梯度下降的每次迭代的临时向量，它们在后续迭代中没有再次被初始化为0向量。也就是说，在第二次迭代中，修改到h(i)和theta_temp(i)的值只是加到旧值上。因此，代码无法正常工作。你需要在每次迭代开始时将向量更新为零向量，然后它们就能正确工作。以下是我对你的代码（第一个代码，注意更改）的实现：

function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)    % get the number of rows and columns    nrows = size(X, 1);    ncols = size(X, 2);    % run gradient descent for the specified number of iterations    count = 1;    while count <= iterations        % initialize the hypothesis vector        h = zeros(nrows, 1);        % initialize the temporary theta vector        theta_temp = zeros(ncols, 1);        % calculate the hypothesis values and fill into the vector        for i = 1 : nrows            for j = 1 : ncols                term = theta(j) * X(i, j);                h(i) = h(i) + term;            end        end        % calculate the gradient        for j = 1 : ncols            for i = 1 : nrows                term = (h(i) - y(i)) * X(i, j);                theta_temp(j) = theta_temp(j) + term;            end        end        % update the gradient with the factor        fact = alpha / nrows;        for i = 1 : ncols            theta_temp(i) = fact * theta_temp(i);        end        % update the theta        for i = 1 : ncols            theta(i) = theta(i) - theta_temp(i);        end        % update the count        count += 1;    endend

我运行了代码，它给出了你提到的相同的theta值。然而，我很好奇你是如何断言假设向量的输出在两种情况下是相同的，而显然，这是第一个代码失败的原因之一！

学技术

以下是将给定内容翻译为中文后的结果，保持了原有的HTML标签和格式，并对语句进行了适当的改写以使其更加通顺和自然：迭代实现梯度下降算法中的错误是什么？

发表回复取消回复

相关文章：

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数？

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名？

如何查看每个词的tf-idf得分

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’？

如何向神经网络输入两个不同大小的输入？

逻辑回归与机器学习有何关联

发表回复 取消回复

发表回复取消回复