我一直在关注Siraj Raval关于使用梯度下降法的逻辑回归的视频:
1) 较长视频的链接:https://www.youtube.com/watch?v=XdM6ER7zTLk&t=2686s
2) 较短视频的链接:https://www.youtube.com/watch?v=xRJCOz3AfYY&list=PL2-dafEMk2A7mu0bSksCGMJEmeddU_H4D
在视频中,他谈到了使用梯度下降法来减少错误,直到函数收敛(斜率变为零)。他还通过代码展示了这个过程。以下是代码中的两个主要函数:
def step_gradient(b_current, m_current, points, learningRate): b_gradient = 0 m_gradient = 0 N = float(len(points)) for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] b_gradient += -(2/N) * (y - ((m_current * x) + b_current)) m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current)) new_b = b_current - (learningRate * b_gradient) new_m = m_current - (learningRate * m_gradient) return [new_b, new_m]def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations): b = starting_b m = starting_m for i in range(num_iterations): b, m = step_gradient(b, m, array(points), learning_rate) return [b, m]#The above functions are called below: learning_rate = 0.0001 initial_b = 0 # initial y-intercept guess initial_m = 0 # initial slope guess num_iterations = 1000 [b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)# code taken from Siraj Raval's github page
为什么b和m的值会在所有迭代中持续更新?在一定数量的迭代之后,函数会收敛,当我们找到使斜率等于0的b和m值时。
那么,为什么我们会在那个点之后继续迭代并继续更新b和m呢?这样做,我们不是会丢失“正确”的b和m值吗?如果我们在收敛后继续更新值,学习率如何帮助收敛过程?因此,为什么没有收敛检查,这实际上是如何工作的?
回答:
实际上,一旦我们达到斜率为0,b_gradient和m_gradient将变为0;
因此,对于:
new_b = b_current – (learningRate * b_gradient)
new_m = m_current – (learningRate * m_gradient)
new_b和new_m将保持旧的正确值,因为不会从中减去任何东西。