我在MATLAB中尝试实现一个逻辑回归求解器,并通过随机梯度下降来寻找权重。我遇到了一个问题,我的数据似乎产生了无穷大的成本,无论发生什么情况,它都不会下降…
这是我的梯度下降函数:
function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features) %% This function attemps to converge on the best set of weights for a logistic regression order 1 %% Input: % trueClass - the training data's vector of true class values % features %% Output: % weightVector - vector of size n+1 (n is number of features) % corresponding to convergent weights %% Get Data Size dataSize = size(features); %% Initial pick for weightVector weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1 %% Choose learning Rate learningRate = 0.0001; %% Initial Cost cost = logisticCost(weightVector, features, trueClass) %% Stochastic Gradient Descent costThresh = 0.05 %define cost threshold iterCount = 0; while(cost > costThresh) for m=1:dataSize(1) %for all samples %% test Statement curFeatures = transpose([1.0 features(m,:)]) %% calculate Sigmoid predicted predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] ) %% test Statement truth = trueClass(m) %% Calculate gradient for all features gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)]) %% Update weight vector by subtrating gradient from the old one weight vector weightVector = weightVector - gradient %% Re-evaluate Cost with new weight vector cost = logisticCost(weightVector, features, trueClass) if(cost < costThresh) break end iterCount = iterCount + 1 end %for m end %while cost > 0.05 weightVector iterCountend
这是我的成本函数:
function cost = logisticCost(weightVector, features, trueClass) %% Calculates the total cost of applying weightVector to all samples %% for a linear regression model according to %% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)] %% Input: % weightVector - vector of n+1 weights where n is number of features % plus 1 % features - matrix of features % trueClass - the training data's true class %% Output: % cost - the total cost dataSize = size(features); %get size of data errorSum = 0.0; %stores sum of errors for m = 1:dataSize(1) %for each row predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m if trueClass(m) == 1 errorSum = errorSum + log(predictedClass); else errorSum = errorSum + log(1 - predictedClass); end end cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get costend
这两个函数看起来都没有问题,我无法想象为什么我的成本函数总是返回无穷大。
这是我的训练数据,第一列是类别(1或0),接下来的七列是我试图回归的特征。
回答:
你的梯度符号错了:
gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])
应该改为:
gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])
详细信息请参见Andrew Ng的笔记: http://cs229.stanford.edu/notes/cs229-notes1.pdf
关于第j个参数的梯度计算如下:(其中h(x)
是逻辑函数;y
是真实标签;x
是特征向量。)
否则,当你取梯度的负值时,你实际上是在进行梯度上升。我认为这就是你最终得到无穷大成本的原因,因为这是一个死循环,你永远无法跳出它。
更新规则仍然应该是:
weightVector = weightVector - gradient