批量梯度下降中的成本函数发散

我正在尝试用Python实现梯度下降法。我希望当abs(J-J_new)达到某个容差水平时(即收敛),计算会停止,其中J是成本函数。计算还会在达到设定的迭代次数后停止。我尝试了几种不同的实现方法,在所有尝试中,成本函数实际上是发散的(即|J-J_new| -> inf)。这对我来说几乎没有意义,我无法从我的代码中确定为什么会这样。我用4个简单的测试数据点来测试这个实现。现在我已经注释掉了,但xy最终将从包含超过400个数据点的文本文件中读取。以下是我能想到的最简单的实现方法:

# import necessary packagesimport numpy as npimport matplotlib.pyplot as plt'''For right now, I will hard code all parameters. After all code is written and I know that I implemented the algorithm correctly, I will consense the code into a single function.'''# Trivial data set to testx = np.array([1, 3, 6, 8])y = np.array([3, 5, 6, 5])# Define parameter valuesalpha = 0.1tol = 1e-06m = y.sizeimax = 100000# Define initial valuestheta_0 = np.array([0.0])   # theta_0 guesstheta_1 = np.array([0.0])   # theta_1 guessJ = sum([(theta_0 - theta_1 * x[i] - y[i])**2 for i in range(m)])# Begin gradient descent algorithmconverged = Falseinum = 0while not converged:    grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)])    grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)])    temp_0 = theta_0 - alpha * grad_0    temp_1 = theta_1 - alpha * grad_1    theta_0 = temp_0    theta_1 = temp_1    J_new = sum([(theta_0 + theta_1 * x[i] - y[i])**2 for i in range(m)])    if abs(J - J_new) <= tol:        print('Converged at iteration', inum)        converged = True    J = J_new    inum = inum + 1    if inum == imax:        print('Maximum number of iterations reached!')        converged = True

回答:

我又做了一些实验。发散是因为学习率alpha设置得太高了。改变检查收敛的方式也有帮助。我不再使用abs(J - J_new)来检查收敛,而是使用abs(theta0_new - theta_0)abs(theta1_new - theta_1)。如果这两个值都在某个容差范围内,那么就认为已经收敛。我还对数据进行了重新缩放(归一化),这似乎也有帮助。以下是代码:

# import necessary packagesimport numpy as npimport matplotlib.pyplot as plt# Gradint descent functiondef gradient_descent(x,y,alpha,tol,imax):    # size of data set    m = y.size        # Define initial values    theta_0 = np.array([0.0])   # theta_0 initial guess    theta_1 = np.array([0.0])   # theta_1 initial guess        # Begin gradient descent algorithm        convergence = False    inum = 0        # While loop continues until convergence = True    while not convergence:                # Calculate gradients for theta_0 and theta_1        grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)])        grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)])                # Update theta_0 and theta_1        temp0 = theta_0 - alpha * grad_0        temp1 = theta_1 - alpha * grad_1        theta0_new = temp0        theta1_new = temp1                # Check convergence, and stop loop if correct conditions are met        if abs(theta0_new - theta_0) <= tol and abs(theta1_new - theta_1) <= tol:            print('We have convergence at iteration', inum, '!')            convergence = True                    # Update theta_0 and theta_1 for next iteration        theta_0 = theta0_new        theta_1 = theta1_new                # Increment itertion counter        inum = inum + 1                # Check iteration number, and stop loop if inum == imax        if inum == imax:            print('Maximum number of iterations reached. We have convergence!')            convrgence = True                # Show result       print('Slope=', theta_1)    print('Intercept=', theta_0)    print('Iteration of convergece=', inum)# Load data from text filedata = np.loadtxt('InputData.txt')# Define data setx = data[:,0]y = data[:,1]# Rescale the datax = x/(max(x)-min(x))y = y/(max(y)-min(y))# Define input parametersalpha = 1e-02tol = 1e-05imax = 10000# Function callgradient_descent(x, y, alpha, tol, imax)

我只用那个文本文件中的数据集进行了检查。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注