我正在尝试开始学习机器学习。
我写了一个简单的例子:
import numpy as np# 准备数据input = np.array(list(range(100)))output = np.array([x**2 + 2 for x in list(range(100))])# 可视化数据import matplotlib.pyplot as pltplt.plot(input, output, 'ro')plt.show()# 定义模型a = 1b = 1# y = ax + b # 我们基于我们的知识在模型中加入了一个偏差# 训练模型 == 优化参数,使其产生非常小的损失for e in range(10): for x, y in zip(input, output): y_hat = a*x + b loss = 0.5*(y_hat-y)**2 # 现在我们有了损失,我们想要参数a和b的梯度 # loss对a的导数 = (-x)(y-ax+b) # 所以梯度下降:a = a - (学习率)*(对a的导数) a = a - 0.1*(-x)*(y_hat-y) b = b - 0.1*(-1)*(y_hat-y) print("Epoch {0} Training loss = {1}".format(e, loss))# 在新数据上进行预测test_input = np.array(list(range(101,150))) test_output = np.array([x**2.0 + 2 for x in list(range(101,150))])model_predictions = np.array([a*x + b for x in list(range(101,150))])plt.plot(test_input, test_output, 'ro')plt.plot(test_input, model_predictions, '-')plt.show()
当我运行代码时:
ml_zero.py:22: RuntimeWarning: overflow encountered in double_scalars loss = 0.5*(y_hat-y)**2Epoch 0 Training loss = infml_zero.py:21: RuntimeWarning: overflow encountered in double_scalars y_hat = a*x + bEpoch 1 Training loss = infml_zero.py:21: RuntimeWarning: invalid value encountered in double_scalars y_hat = a*x + bEpoch 2 Training loss = nanEpoch 3 Training loss = nanEpoch 4 Training loss = nanEpoch 5 Training loss = nanEpoch 6 Training loss = nanEpoch 7 Training loss = nanEpoch 8 Training loss = nanEpoch 9 Training loss = nan
为什么会出现nan错误?我写了最简单的模型,但在Python中我得到的是:
Traceback (most recent call last): File "ml_zero.py", line 20, in <module> loss = (y_hat-y)**2OverflowError: (34, 'Result too large')
然后我将所有Python列表转换为numpy。现在,我得到了Nan错误,我只是不明白为什么这些小值会导致这些错误。
根据Daniele的回答,将损失替换为均方损失,即将损失除以输入的总数,我得到了这样的输出:
Epoch 0 Training loss = 1.7942781420994678e+36Epoch 1 Training loss = 9.232837400842652e+70Epoch 2 Training loss = 4.751367833814119e+105Epoch 3 Training loss = 2.4455835946216386e+140Epoch 4 Training loss = 1.2585275201812707e+175Epoch 5 Training loss = 6.4767849625200624e+209Epoch 6 Training loss = 3.331617554363007e+244Epoch 7 Training loss = 1.714758503849272e+279ml_zero.py:22: RuntimeWarning: overflow encountered in double_scalars loss = 0.5*(y-y_hat)**2Epoch 8 Training loss = infEpoch 9 Training loss = inf
至少它能运行,但我正在尝试使用随机梯度下降来学习线性函数,它在每个点的损失后更新参数。
我仍然不明白人们是如何处理这些模型的,损失应该减少,为什么使用梯度下降反而增加了呢?
回答:
你的数学计算有误。在计算GD的梯度更新时,你必须将损失除以数据集中的样本数:这就是为什么它被称为均方误差而不是仅仅是平方误差。此外,由于你正在尝试处理一个指数函数,你可能需要使用较小的输入,因为它倾向于随着x
的增加而…嗯,指数级增长。
查看这篇文章,以获得关于LR和GD的良好介绍。
我冒昧地重新编写了你的代码一点,这应该可以工作:
import numpy as npimport matplotlib.pyplot as plt# 准备数据input_ = np.linspace(0, 10, 100) # 不要将用户数据分配给Python的内置inputoutput = np.array([x**2 + 2 for x in input_])# 定义模型a = 1b = 1# 训练模型N = input_.shape[0] # 样本数for e in range(10): loss = 0. for x, y in zip(input_, output): y_hat = a * x + b a = a - 0.1 * (2. / N) * (-x) * (y - y_hat) b = b - 0.1 * (2. / N) * (-1) * (y - y_hat) loss += 0.5 * ((y - y_hat) ** 2) loss /= N print("Epoch {:2d}\tLoss: {:4f}".format(e, loss))# 在测试数据上进行预测test_input = np.linspace(0, 15, 150) # 训练数据[0-10] + 测试数据[10 - 15]test_output = np.array([x**2.0 + 2 for x in test_input])model_predictions = np.array([a*x + b for x in test_input])plt.plot(test_input, test_output, 'ro')plt.plot(test_input, model_predictions, '-')plt.show()
这应该会给你这样的输出:
Epoch 0 Loss: 33.117127Epoch 1 Loss: 42.949756Epoch 2 Loss: 40.733332Epoch 3 Loss: 38.657764Epoch 4 Loss: 36.774646Epoch 5 Loss: 35.067299Epoch 6 Loss: 33.520409Epoch 7 Loss: 32.119958Epoch 8 Loss: 30.853112Epoch 9 Loss: 29.708126
这是输出图表:
干杯
编辑: 原始提问者询问关于SGD。以上答案仍然是有效的代码,但它是针对标准GD(你同时迭代整个数据集)。
对于SGD,主要循环必须略作更改:
for e in range(10): for x, y in zip(input_, output): y_hat = a * x + b loss = 0.5 * ((y - y_hat) ** 2) a = a - 0.01 * (2.) * (-x) * (y - y_hat) b = b - 0.01 * (2.) * (-1) * (y - y_hat) print("Epoch {:2d}\tLoss: {:4f}".format(e, loss))
请注意,我不得不降低学习率以避免发散。当你以1为批量大小进行训练时,避免这种梯度爆炸变得非常重要,因为单个样本可能会严重干扰你向最优值的下降过程。
示例输出:
Epoch 0 Loss: 0.130379Epoch 1 Loss: 0.123007Epoch 2 Loss: 0.117352Epoch 3 Loss: 0.112991Epoch 4 Loss: 0.109615Epoch 5 Loss: 0.106992Epoch 6 Loss: 0.104948Epoch 7 Loss: 0.103353Epoch 8 Loss: 0.102105Epoch 9 Loss: 0.101127