同时更新 theta0 和 theta1 以计算 Python 中的梯度下降

我正在Coursera上学习机器学习课程。有一个关于梯度下降的主题，用于优化成本函数。它提到需要同时更新theta0和theta1，以最小化成本函数并达到全局最小值。

梯度下降的公式是

我如何使用Python编程实现这一点？我正在使用numpy数组和pandas，从头开始逐步理解其逻辑。

目前我只计算了成本函数

# step 1 - collect our datadata = pd.read_csv("datasets.txt", header=None)def compute_cost_function(x, y, theta):    '''        Taking in a numpy array x, y, theta and generate the cost function    '''    m = len(y)    # formula for prediction = theta0 + theta1.x    predictions = x.dot(theta)    # formula for square error = ((theta1.x + theta0) - y)**2    square_error = (predictions - y)**2    # sum of square error function    return 1/(2*m) * np.sum(square_error)# converts into numpy represetation of the pandas dataframe. The axes labels will be excludednumpy_data = data.valuesm = data[0].sizex = np.append(np.ones((m, 1)), numpy_data[:, 0].reshape(m, 1), axis=1)y = numpy_data[:, 1].reshape(m, 1)theta = np.zeros((2, 1))compute_cost_function(x, y, theta)def gradient_descent(x, y, theta, alpha):    '''        simultaneously update theta0 and theta1 where         theta0 = theta0 - apha * 1/m * (sum of square error)    '''    pass

我知道我需要从梯度下降中调用compute_cost_function，但无法应用那个公式。

回答：

这意味着你使用参数的前一个值计算右侧需要的内容。完成后，更新参数。为了最清晰地实现这一点，在你的函数内创建一个临时数组来存储右侧的结果，并在完成后返回计算结果。

def gradient_descent(x, y, theta, alpha):    ''' simultaneously update theta0 and theta1 where        theta0 = theta0 - apha * 1/m * (sum of square error) '''     theta_return = np.zeros((2, 1))    theta_return[0] = theta[0] - (alpha / m) * ((x.dot(theta) - y).sum())    theta_return[1] = theta[1] - (alpha / m) * (((x.dot(theta) - y)*x[:, 1][:, None]).sum())    return theta_return

我们首先声明临时数组，然后分别计算参数的每一部分，即截距和斜率，最后返回我们需要的结果。上述代码的优点是我们以向量化的方式进行操作。对于截距项，x.dot(theta)执行矩阵向量乘法，其中你有数据矩阵x和参数向量theta。通过用输出值y减去这个结果，我们计算所有预测值与真实值之间的误差总和，然后乘以学习率并除以样本数量。我们对斜率项进行类似的操作，只是我们额外乘以每个输入值，不包括偏置项。我们还需要确保输入值是按列排列的，因为对x的第二列进行切片会得到一个一维的NumPy数组，而不是带有单一列的二维数组。这允许元素-wise乘法能够很好地进行。

需要注意的是，在更新参数时根本不需要计算成本。请注意，在你的优化循环中调用它会很好，因为你正在更新你的参数，这样你就可以看到你的参数从数据中学习得如何。

为了真正地实现向量化并利用同时更新，你可以将此表述为对训练样本的矩阵-向量乘法：

def gradient_descent(x, y, theta, alpha):    ''' simultaneously update theta0 and theta1 where        theta0 = theta0 - apha * 1/m * (sum of square error) '''     return theta - (alpha / m) * x.T.dot(x.dot(theta) - y)

这样做的作用是，当我们计算x.dot(theta)时，这计算了预测值，然后我们通过减去预期值来结合这些值。这产生了误差向量。当我们预乘以x的转置时，最终发生的是我们以向量化的方式对误差向量进行求和，使得转置矩阵x的第一行对应于1的值，这意味着我们只是简单地将所有误差项相加，这为我们提供了偏置或截距项的更新。同样，转置矩阵x的第二行额外地将每个误差项乘以x中对应的样本值（不包括偏置项为1），并以这种方式计算总和。结果是一个2×1的向量，当我们减去我们参数的前一个值并乘以学习率和样本数量时，得到最终的更新。

我没有意识到你将代码放在迭代框架中。在这种情况下，你需要在每次迭代时更新参数。

def gradient_descent(x, y, theta, alpha, iterations):    ''' simultaneously update theta0 and theta1 where    theta0 = theta0 - apha * 1/m * (sum of square error) '''     theta_return = np.zeros((2, 1))    for i in range(iterations):        theta_return[0] = theta[0] - (alpha / m) * ((x.dot(theta) - y).sum())        theta_return[1] = theta[1] - (alpha / m) * (((x.dot(theta) - y)*x[:, 1][:, None]).sum())        theta = theta_return    return thetatheta = gradient_descent(x, y, theta, 0.01, 1000)

在每次迭代中，你更新参数，然后正确设置，以便下一次当前的更新成为前一次的更新。

学技术

同时更新 theta0 和 theta1 以计算 Python 中的梯度下降

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复