逻辑回归的随机梯度下降总是返回无穷大的成本，并且权重向量从未更接近

我在MATLAB中尝试实现一个逻辑回归求解器，并通过随机梯度下降来寻找权重。我遇到了一个问题，我的数据似乎产生了无穷大的成本，无论发生什么情况，它都不会下降…

这是我的梯度下降函数:

function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features)    %% This function attemps to converge on the best set of weights for a logistic regression order 1    %% Input:    % trueClass - the training data's vector of true class values    % features    %% Output:    % weightVector - vector of size n+1 (n is number of features)    % corresponding to convergent weights        %% Get Data Size    dataSize = size(features);        %% Initial pick for weightVector    weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1        %% Choose learning Rate    learningRate = 0.0001;        %% Initial Cost    cost = logisticCost(weightVector, features, trueClass)            %% Stochastic Gradient Descent    costThresh = 0.05 %define cost threshold        iterCount = 0;    while(cost > costThresh)        for m=1:dataSize(1) %for all samples                        %% test Statement            curFeatures = transpose([1.0 features(m,:)])                        %% calculate Sigmoid predicted             predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] )            %% test Statement            truth = trueClass(m)                                    %% Calculate gradient for all features            gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])            %% Update weight vector by subtrating gradient from the old one weight vector            weightVector = weightVector - gradient                         %% Re-evaluate Cost with new weight vector            cost = logisticCost(weightVector, features, trueClass)                        if(cost < costThresh)                break            end            iterCount = iterCount + 1                    end %for m    end %while cost > 0.05        weightVector    iterCountend

这是我的成本函数:

function cost = logisticCost(weightVector, features, trueClass)    %% Calculates the total cost of applying weightVector to all samples    %% for a linear regression model according to    %% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)]    %% Input:    % weightVector - vector of n+1 weights where n is number of features    % plus 1    % features - matrix of features    % trueClass - the training data's true class    %% Output:    % cost - the total cost       dataSize = size(features); %get size of data        errorSum = 0.0; %stores sum of errors    for m = 1:dataSize(1) %for each row        predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m        if trueClass(m) == 1            errorSum = errorSum + log(predictedClass);        else            errorSum = errorSum + log(1 - predictedClass);        end    end            cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get costend

这两个函数看起来都没有问题，我无法想象为什么我的成本函数总是返回无穷大。

这是我的训练数据，第一列是类别（1或0），接下来的七列是我试图回归的特征。

回答：

你的梯度符号错了:

gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])

应该改为:

gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])

详细信息请参见Andrew Ng的笔记: http://cs229.stanford.edu/notes/cs229-notes1.pdf

关于第j个参数的梯度计算如下：（其中h(x)是逻辑函数；y是真实标签；x是特征向量。） enter image description here

否则，当你取梯度的负值时，你实际上是在进行梯度上升。我认为这就是你最终得到无穷大成本的原因，因为这是一个死循环，你永远无法跳出它。

更新规则仍然应该是:

weightVector = weightVector - gradient

学技术

逻辑回归的随机梯度下降总是返回无穷大的成本，并且权重向量从未更接近

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复