使用scikit-learn线性模型约束系数和

我正在使用1000个系数进行LassoCV。Statsmodels似乎无法处理这么多的系数。因此,我使用了scikit-learn。Statsmodels允许通过.fit_constrained(“coef1 + coef2…=1”)来约束系数的总和为1。我需要在Scikit中实现这个功能。我还将截距保持为零。

from sklearn.linear_model import LassoCVLassoCVmodel = LassoCV(fit_intercept=False)LassoCVmodel.fit(x,y)

任何帮助都将不胜感激。


回答:

正如评论中提到的:文档和源代码都没有表明scikit-learn支持此功能!

我刚刚尝试了使用现成的凸优化求解器的替代方案。这只是一个简单的原型方法,可能并不适合你的(未完全定义的)任务(样本大小?)。

一些评论:

  • 实现/模型公式化很容易
  • 问题比我想象的更难解决
    • ECOS求解器普遍遇到麻烦
    • SCS求解器达到了不错的精度(与scikit-learn相比较差)
    • 但是:调整迭代次数以提高精度会破坏求解器
      • 对于SCS来说问题将变得不可行!
    • 基于bigM的SCS公式化(约束作为目标函数中的惩罚项)看起来可用;但可能需要调整
    • 仅测试了开源求解器,商业求解器可能表现得更好

进一步尝试的方法:

  • 对于处理大型问题(性能比稳健性和精度更重要),(加速的)投影随机梯度方法看起来很有前景

代码

""" data """from time import perf_counter as pcimport numpy as npfrom sklearn import datasetsdiabetes = datasets.load_diabetes()A = diabetes.datay = diabetes.targetalpha=0.1print('Problem-size: ', A.shape)def obj(x):  # following sklearn's definition from user-guide!    return (1. / (2*A.shape[0])) * np.square(np.linalg.norm(A.dot(x) - y, 2)) + alpha * np.linalg.norm(x, 1)""" sklearn """print('\nsklearn classic l1')from sklearn import linear_modelclf = linear_model.Lasso(alpha=alpha, fit_intercept=False)t0 = pc()clf.fit(A, y)print('used (secs): ', pc() - t0)print(obj(clf.coef_))print('sum x: ', np.sum(clf.coef_))""" cvxpy """print('\ncvxpy + scs classic l1')from cvxpy import *x = Variable(A.shape[1])objective = Minimize((1. / (2*A.shape[0])) * sum_squares(A*x - y) + alpha * norm(x, 1))problem = Problem(objective, [])t0 = pc()problem.solve(solver=SCS, use_indirect=False, max_iters=10000, verbose=False)print('used (secs): ', pc() - t0)print(obj(x.value.flat))print('sum x: ', np.sum(x.value.flat))""" cvxpy -> sum x == 1 """print('\ncvxpy + scs sum == 1 / 1st approach')objective = Minimize((1. / (2*A.shape[0])) * sum_squares(A*x - y))constraints = [sum(x) == 1]problem = Problem(objective, constraints)t0 = pc()problem.solve(solver=SCS, use_indirect=False, max_iters=10000, verbose=False)print('used (secs): ', pc() - t0)print(obj(x.value.flat))print('sum x: ', np.sum(x.value.flat))""" cvxpy approach 2 -> sum x == 1 """print('\ncvxpy + scs sum == 1 / 2nd approach')M = 1e6objective = Minimize((1. / (2*A.shape[0])) * sum_squares(A*x - y) + M*(sum(x) - 1))constraints = [sum(x) == 1]problem = Problem(objective, constraints)t0 = pc()problem.solve(solver=SCS, use_indirect=False, max_iters=10000, verbose=False)print('used (secs): ', pc() - t0)print(obj(x.value.flat))print('sum x: ', np.sum(x.value.flat))

输出

Problem-size:  (442, 10)sklearn classic l1used (secs):  0.00145102438034889813201.3508496sum x:  891.78869298cvxpy + scs classic l1used (secs):  0.01116567335741745813203.6549995sum x:  872.520510561cvxpy + scs sum == 1 / 1st approachused (secs):  0.1535085389177597813400.1272148sum x:  -8.43795102327cvxpy + scs sum == 1 / 2nd approachused (secs):  0.01257956938353649313397.2932976sum x:  1.01207061047

编辑

为了好玩,我实现了一个使用加速投影梯度方法的慢速非优化原型求解器(代码中的注释!)。

尽管这里表现缓慢(因为没有优化),但对于大型问题(因为它是一种一阶方法),这种方法应该能更好地扩展。有很大的潜力!

警告:对一些人来说,这可能被视为高级数值优化 🙂

编辑2: 我忘了在投影上添加非负约束(如果x可以是非负的,那么sum(x) == 1就没有多大意义了!)。这使得求解变得更加困难(数值问题),很明显,应该使用那些快速的专用投影方法(我现在太懒了;我认为有n*log n的算法可用)。再次强调:这个APG求解器是一个原型,不适合实际任务。

代码

""" accelerated pg  -> sum x == 1 """def solve_pg(A, b, momentum=0.9, maxiter=1000):    """ remarks:            algorithm: accelerated projected gradient            projection: proj on probability-simplex                -> naive and slow using cvxpy + ecos            line-search: armijo-rule along projection-arc (Bertsekas book)                -> suffers from slow projection            stopping-criterion: naive            gradient-calculation: precomputes AtA                -> not needed and not recommended for huge sparse data!    """    M, N = A.shape    x = np.zeros(N)    AtA = A.T.dot(A)    Atb = A.T.dot(b)    stop_count = 0    # projection helper    x_ = Variable(N)    v_ = Parameter(N)    objective_ =  Minimize(0.5 * square(norm(x_ - v_, 2)))    constraints_ = [sum(x_) == 1]    problem_ = Problem(objective_, constraints_)    def gradient(x):        return AtA.dot(x) - Atb    def obj(x):        return 0.5 * np.linalg.norm(A.dot(x) - b)**2    it = 0    while True:        grad = gradient(x)        # line search        alpha = 1        beta = 0.5        sigma=1e-2        old_obj = obj(x)        while True:            new_x = x - alpha * grad            new_obj = obj(new_x)            if old_obj - new_obj >= sigma * grad.dot(x - new_x):                break            else:                alpha *= beta        x_old = x[:]        x = x - alpha*grad        # projection        v_.value = x        problem_.solve()        x = np.array(x_.value.flat)        y = x + momentum * (x - x_old)        if np.abs(old_obj - obj(x)) < 1e-2:            stop_count += 1        else:            stop_count = 0        if stop_count == 3:            print('early-stopping @ it: ', it)            return x        it += 1        if it == maxiter:            return xprint('\n acc pg')t0 = pc()x = solve_pg(A, y)print('used (secs): ', pc() - t0)print(obj(x))print('sum x: ', np.sum(x))

输出

acc pgearly-stopping @ it:  367used (secs):  0.771451133048702713396.8642379sum x:  1.00000000002

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注