我理解这个维基页面所说的内容(http://en.wikipedia.org/wiki/Multinomial_logistic_regression),但我不知道如何得到随机梯度下降的更新规则。很抱歉在这里问这个问题(这实际上只是关于机器学习理论而不是实际实现)。能有人提供一个带解释的解决方案吗?提前感谢!
回答:
我碰巧写了实现softmax的代码,我主要参考了这个页面 http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
这是我在Matlab中写的代码,希望对你有帮助
function y = sigmoid_multi(weight,x,class_index)%% weight feature_dim * class_num%% x feature_dim * 1%% class_index scalar sum = eps; class_num = size(weight,2); for i = 1:class_num sum = sum + exp(weight(:,i)'*x); end y = exp(weight(:,class_index)'*x)/sum;endfunction g = gradient(train_patterns,train_labels,weight) m = size(train_patterns,2); class_num = size(weight,2); g = zeros(size(weight)); for j = 1:class_num for i = 1:m if(train_labels(i) == j) g(:,j) = g(:,j) + (1 - log( sigmoid_multi(weight,train_patterns(:,i),j) + eps))*train_patterns(:,i); end end end g = -(g/m);endfunction J = object_function(train_patterns,train_labels,weight) m = size(train_patterns,2); J = 0; for i = 1:m J = J + log( sigmoid_multi(weight,train_patterns(:,i),train_labels(i)) + eps); end J = -(J/m);endfunction weight = multi_logistic_train(train_patterns,train_labels,alpha)%% weight feature_dim * class_num%% train_patterns featur_dim * sample_num%% train_labels 1 * sample_num%% alpha scalar class_num = length(unique(train_labels)); m = size(train_patterns,2); %% sample_number; n = size(train_patterns,1); % feature_dim; weight = rand(n,class_num); for i = 1:40 J = object_function(train_patterns,train_labels,weight); fprintf('objec function value : %f\n',J); weight = weight - alpha*gradient(train_patterns,train_labels,weight); end end