是否可以使用随机梯度下降来进行时间序列分析?
我的初步想法是,给定一系列(t, v)对,我希望使用SGD回归器来预测与t+1相关联的v。我的做法是将日期/时间转换为整数值,并使用铰链损失函数在该列表上训练回归器。这可行吗?
编辑:这是使用scikit-learn中的SGD实现的示例代码。然而,它无法正确预测一个简单的线性时间序列模型。它似乎只是计算了训练Y值的平均值,并将其用作测试Y值的预测。SGD是否不适合时间序列分析,还是我的方法有误?
from datetime import datefrom sklearn.linear_model import SGDRegressor# Build data.s = date(2010,1,1)i = 0training = []for _ in xrange(12): i += 1 training.append([[date(2012,1,i).toordinal()], i])testing = []for _ in xrange(12): i += 1 testing.append([[date(2012,1,i).toordinal()], i])clf = SGDRegressor(loss='huber')print 'Training...'for _ in xrange(20): try: print _ clf.partial_fit(X=[X for X,_ in training], y=[y for _,y in training]) except ValueError: breakprint 'Testing...'for X,y in testing: p = clf.predict(X) print y,p,abs(p-y)
回答:
sklearn中的SGDRegressor对于未缩放的输入参数在数值上是不稳定的。为了获得良好的结果,强烈建议您对输入变量进行缩放。
from datetime import datefrom sklearn.linear_model import SGDRegressor# Build data.s = date(2010,1,1).toordinal()i = 0training = []for _ in range(1,13): i += 1 training.append([[s+i], i])testing = []for _ in range(13,25): i += 1 testing.append([[s+i], i])from sklearn.preprocessing import StandardScalerscaler = StandardScaler()X_train = scaler.fit_transform([X for X,_ in training])
在训练SGD回归器后,您需要相应地缩放测试输入变量。
clf = SGDRegressor()clf.fit(X=X_train, y=[y for _,y in training]) print(clf.intercept_, clf.coef_)print('Testing...')for X,y in testing: p = clf.predict(scaler.transform([X])) print(X[0],y,p[0],abs(p[0]-y))
这是结果:
[6.31706122] [3.35332573]Testing...733786 13 12.631164799851827 0.3688352001481725733787 14 13.602565350686039 0.39743464931396133733788 15 14.573965901520248 0.42603409847975193733789 16 15.545366452354457 0.45463354764554254733790 17 16.51676700318867 0.48323299681133136733791 18 17.488167554022876 0.5118324459771237733792 19 18.459568104857084 0.5404318951429161733793 20 19.430968655691295 0.569031344308705733794 21 20.402369206525506 0.5976307934744938733795 22 21.373769757359714 0.6262302426402861733796 23 22.34517030819392 0.6548296918060785733797 24 23.316570859028133 0.6834291409718674