### 在Python中实现逻辑回归的GD

我在Python中实现了逻辑回归,使用了如下的正则化损失函数:enter image description here

但是梯度算法表现得很差。请先阅读粗体文本!请逐个代码单元格粘贴代码

import numpy as np, scipy as sp, sklearn as slfrom scipy import special as ssfrom sklearn.base import ClassifierMixin, BaseEstimatorfrom sklearn.datasets import make_classificationimport theano.tensor as T

这是损失函数:(scipy用于在1附近“裁剪”对数的参数)

def lossf(w, X, y, l1, l2):     w.resize((w.shape[0],1))     y.resize((y.shape[0],1))     lossf1 = np.sum(ss.log1p(1 + ss.expm1(np.multiply(-y, np.dot(X, w)))))     lossf2 = l2 * (np.dot(np.transpose(w), w))     lossf3 = l1 * sum(abs(w))     lossf = np.float(lossf1 + lossf2 + lossf3)     return lossf

这是梯度函数:(??这里有问题?? -请看结尾)

def gradf(w, X, y, l1, l2):    w.resize((w.shape[0],1))    y.resize((y.shape[0],1))    gradw1 = l2 * 2 * w     gradw2 = l1 * np.sign(w)    gradw3 = np.multiply(-y,(2 + ss.expm1(np.multiply(-y, np.dot(X, w)))))    gradw3 = gradw3 / (2 + (ss.expm1((np.multiply(-y, np.dot(X, w))))))    gradw3 = np.sum(np.multiply(gradw3, X), axis=0)    gradw3.resize(gradw3.shape[0],1)    gradw = gradw1 + gradw2 + gradw3    gradw.resize(gradw.shape[0],)    return np.transpose(gradw)

这是我的LR类:

class LR(ClassifierMixin, BaseEstimator):    def __init__(self, lr=0.0001, l1=0.1, l2=0.1, num_iter=100, verbose=0):        self.l1 = l1        self.l2 = l2        self.w = None        self.lr = lr        self.verbose = verbose        self.num_iter = num_iterdef fit(self, X, y):            n, d = X.shape     self.w = np.zeros(shape=(d,))    for i in range(self.num_iter):        g = gradf(self.w, X, y, self.l1, self.l2)        g.resize((g.shape[0],1))        self.w = self.w - g        print "Loss: ", lossf(self.w, X, y, self.l1, self.l2)    return selfdef predict_proba(self, X):    probs = 1/(2 + ss.expm1(np.dot(-X, self.w)))    return probs def predict(self, X):    probs = self.predict_proba(X)    probs = np.sign(2 * probs - 1)    probs.resize((probs.shape[0],))    return probs 

这是测试代码:

X, y = make_classification(n_features=100, n_samples=100)y = 2 * (y - 0.5)clf = LR(lr=0.000001, l1=0.1, l2=0.1, num_iter=10, verbose=0)clf = clf.fit(X, y)yp = clf.predict(X)yp.resize((100,1))accuracy = int(sum(y == yp))/len(y)

哦,不。这没有收敛。但是如果我用theano替换我的gradw3:

gradw3 = get_gradw3(w,X,y)where:w,X,y = T.matrices("wXy") logloss = T.sum(T.log1p(1 + T.expm1(-y* T.dot(X, w)))) get_gradw3 = theano.function([w,X,y],T.grad(logloss,w).reshape(w.shape))

它会收敛到100%的准确率。这意味着,我的gradw3实现有误,但我找不到错误。急切地寻求帮助!


回答:

实际上,我最终让它工作了。我不知道确切的关键变化是什么,但这是我更改的摘要:

  • 将所有np.multiply替换为*

  • 降低学习率和正则化器

  • 对指数应用np.nan_to_num

所以这是最终的代码:

def lossf(w, X, y, l1, l2):    w.resize((w.shape[0],1))    y.resize((y.shape[0],1))    lossf1 = np.sum(ss.log1p(1 + np.nan_to_num(ss.expm1(-y * np.dot(X, w)))))    lossf2 = l2 * (np.dot(np.transpose(w), w))    lossf3 = l1 * sum(abs(w))    lossf = np.float(lossf1 + lossf2 + lossf3)    return lossfdef gradf(w, X, y, l1, l2):    w.resize((w.shape[0],1))    y.resize((y.shape[0],1))    gradw1 = l2 * 2 * w     gradw2 = l1 * np.sign(w)    gradw3 = -y * (1 + np.nan_to_num(ss.expm1(-y * np.dot(X, w))))    gradw3 = gradw3 / (2 + np.nan_to_num(ss.expm1(-y * np.dot(X, w))))    gradw3 = np.sum(gradw3 * X, axis=0)    gradw3.resize(gradw3.shape[0],1)    gradw = gradw1 + gradw2 + gradw3    gradw.resize(gradw.shape[0],)    return np.transpose(gradw)class LR(ClassifierMixin, BaseEstimator):    def __init__(self, lr=0.000001, l1=0.1, l2=0.1, num_iter=100, verbose=0):        self.l1 = l1        self.l2 = l2        self.w = None        self.lr = lr        self.verbose = verbose        self.num_iter = num_iter    def fit(self, X, y):               n, d = X.shape         self.w = np.zeros(shape=(d,))        for i in range(self.num_iter):            print "\n", "Iteration ", i            g = gradf(self.w, X, y, self.l1, self.l2)            g.resize((g.shape[0],1))            self.w = self.w - g            print "Loss: ", lossf(self.w, X, y, self.l1, self.l2)        return self    def predict_proba(self, X):        probs = 1/(2 + ss.expm1(np.dot(-X, self.w)))        return probs     def predict(self, X):        probs = self.predict_proba(X)        probs = np.sign(2 * probs - 1)        probs.resize((probs.shape[0],))        return probs 

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注