逻辑回归的随机梯度下降总是返回无穷大的成本,并且权重向量从未更接近

我在MATLAB中尝试实现一个逻辑回归求解器,并通过随机梯度下降来寻找权重。我遇到了一个问题,我的数据似乎产生了无穷大的成本,无论发生什么情况,它都不会下降…

这是我的梯度下降函数:

function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features)    %% This function attemps to converge on the best set of weights for a logistic regression order 1    %% Input:    % trueClass - the training data's vector of true class values    % features    %% Output:    % weightVector - vector of size n+1 (n is number of features)    % corresponding to convergent weights        %% Get Data Size    dataSize = size(features);        %% Initial pick for weightVector    weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1        %% Choose learning Rate    learningRate = 0.0001;        %% Initial Cost    cost = logisticCost(weightVector, features, trueClass)            %% Stochastic Gradient Descent    costThresh = 0.05 %define cost threshold        iterCount = 0;    while(cost > costThresh)        for m=1:dataSize(1) %for all samples                        %% test Statement            curFeatures = transpose([1.0 features(m,:)])                        %% calculate Sigmoid predicted             predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] )            %% test Statement            truth = trueClass(m)                                    %% Calculate gradient for all features            gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])            %% Update weight vector by subtrating gradient from the old one weight vector            weightVector = weightVector - gradient                         %% Re-evaluate Cost with new weight vector            cost = logisticCost(weightVector, features, trueClass)                        if(cost < costThresh)                break            end            iterCount = iterCount + 1                    end %for m    end %while cost > 0.05        weightVector    iterCountend

这是我的成本函数:

function cost = logisticCost(weightVector, features, trueClass)    %% Calculates the total cost of applying weightVector to all samples    %% for a linear regression model according to    %% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)]    %% Input:    % weightVector - vector of n+1 weights where n is number of features    % plus 1    % features - matrix of features    % trueClass - the training data's true class    %% Output:    % cost - the total cost       dataSize = size(features); %get size of data        errorSum = 0.0; %stores sum of errors    for m = 1:dataSize(1) %for each row        predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m        if trueClass(m) == 1            errorSum = errorSum + log(predictedClass);        else            errorSum = errorSum + log(1 - predictedClass);        end    end            cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get costend

这两个函数看起来都没有问题,我无法想象为什么我的成本函数总是返回无穷大。

这是我的训练数据,第一列是类别(1或0),接下来的七列是我试图回归的特征。


回答:

你的梯度符号错了:

gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)]) 

应该改为:

gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])

详细信息请参见Andrew Ng的笔记: http://cs229.stanford.edu/notes/cs229-notes1.pdf

关于第j个参数的梯度计算如下:(其中h(x)是逻辑函数;y是真实标签;x是特征向量。)enter image description here

否则,当你取梯度的负值时,你实际上是在进行梯度上升。我认为这就是你最终得到无穷大成本的原因,因为这是一个死循环,你永远无法跳出它。

更新规则仍然应该是:

weightVector = weightVector - gradient 

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注