为什么我在R中用于回归的梯度下降法失败了?

我调整了以下梯度下降算法,用于对存储在data[:,4]中的y变量和存储在data[:,1]中的x变量进行回归。然而,梯度下降似乎出现了发散的情况。我希望能得到一些帮助,以找出我做错的地方。

#define the sum of squared residualsssquares <- function(x)   {    t = 0    for(i in 1:200)      {        t <- t + (data[i,4] - x[1] - x[2]*data[i,1])^2       }    t/200  }# define the derivativesderivative <- function(x)   {    t1 = 0    for(i in 1:200)      {        t1 <- t1 - 2*(data[i,4] - x[1] - x[2]*data[i,1])       }    t2 = 0    for(i in 1:200)      {      t2 <- t2 - 2*data[i,1]*(data[i,4] - x[1] - x[2]*data[i,1])       }   c(t1/200,t2/200)  }# definition of the gradient descent method in 2Dgradient_descent <- function(func, derv, start, step=0.05, tol=1e-8) {  pt1 <- start  grdnt <- derv(pt1)  pt2 <- c(pt1[1] - step*grdnt[1], pt1[2] - step*grdnt[2])  while (abs(func(pt1)-func(pt2)) > tol) {    pt1 <- pt2    grdnt <- derv(pt1)    pt2 <- c(pt1[1] - step*grdnt[1], pt1[2] - step*grdnt[2])    print(func(pt2)) # print progress  }  pt2 # return the last point}# locate the minimum of the function using the Gradient Descent methodresult <- gradient_descent(  ssquares, # the function to optimize  derivative, # the gradient of the function  c(1,1), # start point of theplot_loss(simple_ex)  search   0.05, # step size (alpha)  1e-8) # relative tolerance for one step# display a summary of the resultsprint(result) # coordinate of fucntion minimumprint(ssquares(result)) # response of function minimum

回答:

你可以将目标函数/梯度函数向量化以实现更快的执行速度,如你所见,它实际上在随机生成的数据上收敛了,并且系数与R中的lm()函数得到的非常接近:

ssquares <- function(x) {  n <- nrow(data) # 200  sum((data[,4] - cbind(1, data[,1]) %*% x)^2) / n}# define the derivativesderivative <- function(x) {  n <- nrow(data) # 200  c(sum(-2*(data[,4] - cbind(1, data[,1]) %*% x)), sum(-2*(data[,1])*(data[,4] - cbind(1, data[,1]) %*% x))) / n}set.seed(1)#data <- matrix(rnorm(800), nrow=200)# locate the minimum of the function using the Gradient Descent methodresult <- gradient_descent(  ssquares, # the function to optimize  derivative, # the gradient of the function  c(1,1), # start point of theplot_loss(simple_ex)  search   0.05, # step size (alpha)  1e-8) # relative tolerance for one step# [1] 2.511904# [1] 2.263448# [1] 2.061456# [1] 1.89721# [1] 1.763634# [1] 1.654984# [1] 1.566592# [1] 1.494668# ...# display a summary of the resultsprint(result) # coefficients obtained with gradient descent#[1] -0.10248356  0.08068382lm(data[,4]~data[,1])$coef # coefficients from R lm()# (Intercept)   data[, 1] # -0.10252181  0.08045722 # use new dataset, this time it takes quite sometime to converge, but the # values GD converges to are pretty accurate as you can see from below.data <- read.csv('Advertising.csv') # with advertising data, removing the first rownames column# locate the minimum of the function using the Gradient Descent methodresult <- gradient_descent(  ssquares, # the function to optimize  derivative, # the gradient of the function  c(1,1), # start point of theplot_loss(simple_ex)  search   0.00001, # step size (alpha), decreasing the learning rate  1e-8) # relative tolerance for one step# ...# [1] 10.51364# [1] 10.51364# [1] 10.51364print(result) # coordinate of fucntion minimum[1] 6.97016852 0.04785365lm(data[,4]~data[,1])$coef(Intercept)   data[, 1]  7.03259355  0.04753664 

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注