我正在尝试用Python实现梯度下降法。我希望当abs(J-J_new)
达到某个容差水平时(即收敛),计算会停止,其中J是成本函数。计算还会在达到设定的迭代次数后停止。我尝试了几种不同的实现方法,在所有尝试中,成本函数实际上是发散的(即|J-J_new| -> inf
)。这对我来说几乎没有意义,我无法从我的代码中确定为什么会这样。我用4个简单的测试数据点来测试这个实现。现在我已经注释掉了,但x
和y
最终将从包含超过400个数据点的文本文件中读取。以下是我能想到的最简单的实现方法:
# import necessary packagesimport numpy as npimport matplotlib.pyplot as plt'''For right now, I will hard code all parameters. After all code is written and I know that I implemented the algorithm correctly, I will consense the code into a single function.'''# Trivial data set to testx = np.array([1, 3, 6, 8])y = np.array([3, 5, 6, 5])# Define parameter valuesalpha = 0.1tol = 1e-06m = y.sizeimax = 100000# Define initial valuestheta_0 = np.array([0.0]) # theta_0 guesstheta_1 = np.array([0.0]) # theta_1 guessJ = sum([(theta_0 - theta_1 * x[i] - y[i])**2 for i in range(m)])# Begin gradient descent algorithmconverged = Falseinum = 0while not converged: grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)]) grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)]) temp_0 = theta_0 - alpha * grad_0 temp_1 = theta_1 - alpha * grad_1 theta_0 = temp_0 theta_1 = temp_1 J_new = sum([(theta_0 + theta_1 * x[i] - y[i])**2 for i in range(m)]) if abs(J - J_new) <= tol: print('Converged at iteration', inum) converged = True J = J_new inum = inum + 1 if inum == imax: print('Maximum number of iterations reached!') converged = True
回答:
我又做了一些实验。发散是因为学习率alpha
设置得太高了。改变检查收敛的方式也有帮助。我不再使用abs(J - J_new)
来检查收敛,而是使用abs(theta0_new - theta_0)
和abs(theta1_new - theta_1)
。如果这两个值都在某个容差范围内,那么就认为已经收敛。我还对数据进行了重新缩放(归一化),这似乎也有帮助。以下是代码:
# import necessary packagesimport numpy as npimport matplotlib.pyplot as plt# Gradint descent functiondef gradient_descent(x,y,alpha,tol,imax): # size of data set m = y.size # Define initial values theta_0 = np.array([0.0]) # theta_0 initial guess theta_1 = np.array([0.0]) # theta_1 initial guess # Begin gradient descent algorithm convergence = False inum = 0 # While loop continues until convergence = True while not convergence: # Calculate gradients for theta_0 and theta_1 grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)]) grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)]) # Update theta_0 and theta_1 temp0 = theta_0 - alpha * grad_0 temp1 = theta_1 - alpha * grad_1 theta0_new = temp0 theta1_new = temp1 # Check convergence, and stop loop if correct conditions are met if abs(theta0_new - theta_0) <= tol and abs(theta1_new - theta_1) <= tol: print('We have convergence at iteration', inum, '!') convergence = True # Update theta_0 and theta_1 for next iteration theta_0 = theta0_new theta_1 = theta1_new # Increment itertion counter inum = inum + 1 # Check iteration number, and stop loop if inum == imax if inum == imax: print('Maximum number of iterations reached. We have convergence!') convrgence = True # Show result print('Slope=', theta_1) print('Intercept=', theta_0) print('Iteration of convergece=', inum)# Load data from text filedata = np.loadtxt('InputData.txt')# Define data setx = data[:,0]y = data[:,1]# Rescale the datax = x/(max(x)-min(x))y = y/(max(y)-min(y))# Define input parametersalpha = 1e-02tol = 1e-05imax = 10000# Function callgradient_descent(x, y, alpha, tol, imax)
我只用那个文本文件中的数据集进行了检查。