使用SciPy进行逻辑回归

我正在尝试使用Python中的SciPy fmin_bfgs函数编写逻辑回归代码,但遇到了一些问题。我编写了逻辑(Sigmoid)变换函数和成本函数,这些函数运行正常(我使用了通过现成软件找到的参数向量的优化值来测试这些函数,结果一致)。我对梯度函数的实现不太确定,但看起来是合理的。

这是代码:

# purpose: logistic regression import numpy as npimport scipy.optimize# prepare the datadata = np.loadtxt('data.csv', delimiter=',', skiprows=1)vY = data[:, 0]mX = data[:, 1:]intercept = np.ones(mX.shape[0]).reshape(mX.shape[0], 1)mX = np.concatenate((intercept, mX), axis = 1)iK = mX.shape[1]iN = mX.shape[0]# logistic transformationdef logit(mX, vBeta):    return((1/(1.0 + np.exp(-np.dot(mX, vBeta)))))# test function callvBeta0 = np.array([-.10296645, -.0332327, -.01209484, .44626211, .92554137, .53973828,     1.7993371, .7148045  ])logit(mX, vBeta0)# cost functiondef logLikelihoodLogit(vBeta, mX, vY):    return(-(np.sum(vY*np.log(logit(mX, vBeta)) + (1-vY)*(np.log(1-logit(mX, vBeta))))))logLikelihoodLogit(vBeta0, mX, vY) # test function call# gradient functiondef likelihoodScore(vBeta, mX, vY):    return(np.dot(mX.T,                   ((np.dot(mX, vBeta) - vY)/                   np.dot(mX, vBeta)).reshape(iN, 1)).reshape(iK, 1))likelihoodScore(vBeta0, mX, vY).shape # test function call# optimize the function (without gradient)optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogit,                                   x0 = np.array([-.1, -.03, -.01, .44, .92, .53,                                            1.8, .71]),                                   args = (mX, vY), gtol = 1e-3)# optimize the function (with gradient)optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogit,                                   x0 = np.array([-.1, -.03, -.01, .44, .92, .53,                                            1.8, .71]), fprime = likelihoodScore,                                   args = (mX, vY), gtol = 1e-3)
  • 第一次优化(无梯度)结束时出现了大量关于除以零的错误信息。

  • 第二次优化(有梯度)结束时出现了矩阵未对齐的错误,这可能意味着我返回梯度的方式有误。

任何帮助都将不胜感激。如果有人想尝试,可以使用下面的数据。

low,age,lwt,race,smoke,ptl,ht,ui0,19,182,2,0,0,0,10,33,155,3,0,0,0,00,20,105,1,1,0,0,00,21,108,1,1,0,0,10,18,107,1,1,0,0,10,21,124,3,0,0,0,00,22,118,1,0,0,0,00,17,103,3,0,0,0,00,29,123,1,1,0,0,00,26,113,1,1,0,0,00,19,95,3,0,0,0,00,19,150,3,0,0,0,00,22,95,3,0,0,1,00,30,107,3,0,1,0,10,18,100,1,1,0,0,00,18,100,1,1,0,0,00,15,98,2,0,0,0,00,25,118,1,1,0,0,00,20,120,3,0,0,0,10,28,120,1,1,0,0,00,32,121,3,0,0,0,00,31,100,1,0,0,0,10,36,202,1,0,0,0,00,28,120,3,0,0,0,00,25,120,3,0,0,0,10,28,167,1,0,0,0,00,17,122,1,1,0,0,00,29,150,1,0,0,0,00,26,168,2,1,0,0,00,17,113,2,0,0,0,00,17,113,2,0,0,0,00,24,90,1,1,1,0,00,35,121,2,1,1,0,00,25,155,1,0,0,0,00,25,125,2,0,0,0,00,29,140,1,1,0,0,00,19,138,1,1,0,0,00,27,124,1,1,0,0,00,31,215,1,1,0,0,00,33,109,1,1,0,0,00,21,185,2,1,0,0,00,19,189,1,0,0,0,00,23,130,2,0,0,0,00,21,160,1,0,0,0,00,18,90,1,1,0,0,10,18,90,1,1,0,0,10,32,132,1,0,0,0,00,19,132,3,0,0,0,00,24,115,1,0,0,0,00,22,85,3,1,0,0,00,22,120,1,0,0,1,00,23,128,3,0,0,0,00,22,130,1,1,0,0,00,30,95,1,1,0,0,00,19,115,3,0,0,0,00,16,110,3,0,0,0,00,21,110,3,1,0,0,10,30,153,3,0,0,0,00,20,103,3,0,0,0,00,17,119,3,0,0,0,00,17,119,3,0,0,0,00,23,119,3,0,0,0,00,24,110,3,0,0,0,00,28,140,1,0,0,0,00,26,133,3,1,2,0,00,20,169,3,0,1,0,10,24,115,3,0,0,0,00,28,250,3,1,0,0,00,20,141,1,0,2,0,10,22,158,2,0,1,0,00,22,112,1,1,2,0,00,31,150,3,1,0,0,00,23,115,3,1,0,0,00,16,112,2,0,0,0,00,16,135,1,1,0,0,00,18,229,2,0,0,0,00,25,140,1,0,0,0,00,32,134,1,1,1,0,00,20,121,2,1,0,0,00,23,190,1,0,0,0,00,22,131,1,0,0,0,00,32,170,1,0,0,0,00,30,110,3,0,0,0,00,20,127,3,0,0,0,00,23,123,3,0,0,0,00,17,120,3,1,0,0,00,19,105,3,0,0,0,00,23,130,1,0,0,0,00,36,175,1,0,0,0,00,22,125,1,0,0,0,00,24,133,1,0,0,0,00,21,134,3,0,0,0,00,19,235,1,1,0,1,00,25,95,1,1,3,0,10,16,135,1,1,0,0,00,29,135,1,0,0,0,00,29,154,1,0,0,0,00,19,147,1,1,0,0,00,19,147,1,1,0,0,00,30,137,1,0,0,0,00,24,110,1,0,0,0,00,19,184,1,1,0,1,00,24,110,3,0,1,0,00,23,110,1,0,0,0,00,20,120,3,0,0,0,00,25,241,2,0,0,1,00,30,112,1,0,0,0,00,22,169,1,0,0,0,00,18,120,1,1,0,0,00,16,170,2,0,0,0,00,32,186,1,0,0,0,00,18,120,3,0,0,0,00,29,130,1,1,0,0,00,33,117,1,0,0,0,10,20,170,1,1,0,0,00,28,134,3,0,0,0,00,14,135,1,0,0,0,00,28,130,3,0,0,0,00,25,120,1,0,0,0,00,16,95,3,0,0,0,00,20,158,1,0,0,0,00,26,160,3,0,0,0,00,21,115,1,0,0,0,00,22,129,1,0,0,0,00,25,130,1,0,0,0,00,31,120,1,0,0,0,00,35,170,1,0,1,0,00,19,120,1,1,0,0,00,24,116,1,0,0,0,00,45,123,1,0,0,0,01,28,120,3,1,1,0,11,29,130,1,0,0,0,11,34,187,2,1,0,1,01,25,105,3,0,1,1,01,25,85,3,0,0,0,11,27,150,3,0,0,0,01,23,97,3,0,0,0,11,24,128,2,0,1,0,01,24,132,3,0,0,1,01,21,165,1,1,0,1,01,32,105,1,1,0,0,01,19,91,1,1,2,0,11,25,115,3,0,0,0,01,16,130,3,0,0,0,01,25,92,1,1,0,0,01,20,150,1,1,0,0,01,21,200,2,0,0,0,11,24,155,1,1,1,0,01,21,103,3,0,0,0,01,20,125,3,0,0,0,11,25,89,3,0,2,0,01,19,102,1,0,0,0,01,19,112,1,1,0,0,11,26,117,1,1,1,0,01,24,138,1,0,0,0,01,17,130,3,1,1,0,11,20,120,2,1,0,0,01,22,130,1,1,1,0,11,27,130,2,0,0,0,11,20,80,3,1,0,0,11,17,110,1,1,0,0,01,25,105,3,0,1,0,01,20,109,3,0,0,0,01,18,148,3,0,0,0,01,18,110,2,1,1,0,01,20,121,1,1,1,0,11,21,100,3,0,1,0,01,26,96,3,0,0,0,01,31,102,1,1,1,0,01,15,110,1,0,0,0,01,23,187,2,1,0,0,01,20,122,2,1,0,0,01,24,105,2,1,0,0,01,15,115,3,0,0,0,11,23,120,3,0,0,0,01,30,142,1,1,1,0,01,22,130,1,1,0,0,01,17,120,1,1,0,0,01,23,110,1,1,1,0,01,17,120,2,0,0,0,01,26,154,3,0,1,1,01,20,106,3,0,0,0,01,26,190,1,1,0,0,01,14,101,3,1,1,0,01,28,95,1,1,0,0,01,14,100,3,0,0,0,01,23,94,3,1,0,0,01,17,142,2,0,0,1,01,21,130,1,1,0,1,0

回答:

这是我发送回SciPy列表的答案,这个问题在那里被交叉发布。感谢@tiago的回答。基本上,我重新参数化了似然函数。还添加了对check_grad函数的调用。

#=====================================================# purpose: logistic regression import numpy as npimport scipy as spimport scipy.optimizeimport matplotlib as mplimport os# prepare the datadata = np.loadtxt('data.csv', delimiter=',', skiprows=1)vY = data[:, 0]mX = data[:, 1:]# mX = (mX - np.mean(mX))/np.std(mX)  # standardize the data; if requiredintercept = np.ones(mX.shape[0]).reshape(mX.shape[0], 1)mX = np.concatenate((intercept, mX), axis = 1)iK = mX.shape[1]iN = mX.shape[0]# logistic transformationdef logit(mX, vBeta):    return((np.exp(np.dot(mX, vBeta))/(1.0 + np.exp(np.dot(mX, vBeta)))))# test function callvBeta0 = np.array([-.10296645, -.0332327, -.01209484, .44626211, .92554137, .53973828,     1.7993371, .7148045  ])logit(mX, vBeta0)# cost functiondef logLikelihoodLogit(vBeta, mX, vY):    return(-(np.sum(vY*np.log(logit(mX, vBeta)) + (1-vY)*(np.log(1-logit(mX, vBeta))))))logLikelihoodLogit(vBeta0, mX, vY) # test function call# different parametrization of the cost functiondef logLikelihoodLogitVerbose(vBeta, mX, vY):    return(-(np.sum(vY*(np.dot(mX, vBeta) - np.log((1.0 + np.exp(np.dot(mX, vBeta))))) +                    (1-vY)*(-np.log((1.0 + np.exp(np.dot(mX, vBeta))))))))logLikelihoodLogitVerbose(vBeta0, mX, vY)  # test function call# gradient functiondef likelihoodScore(vBeta, mX, vY):    return(np.dot(mX.T,                   (logit(mX, vBeta) - vY)))likelihoodScore(vBeta0, mX, vY).shape # test function callsp.optimize.check_grad(logLikelihoodLogitVerbose, likelihoodScore,                        vBeta0, mX, vY)  # check that the analytical gradient is close to                                                 # numerical gradient# optimize the function (without gradient)optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogitVerbose,                                   x0 = np.array([-.1, -.03, -.01, .44, .92, .53,                                            1.8, .71]),                                   args = (mX, vY), gtol = 1e-3)# optimize the function (with gradient)optimLogit = scipy.optimize.fmin_bfgs(logLikelihoodLogitVerbose,                                   x0 = np.array([-.1, -.03, -.01, .44, .92, .53,                                            1.8, .71]), fprime = likelihoodScore,                                   args = (mX, vY), gtol = 1e-3)#=====================================================

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注