考虑这个最小的可运行示例:
from keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTMimport numpy as npimport matplotlib.pyplot as pltmax = 30step = 0.5n_steps = int(30/0.5)x = np.arange(0,max,step)x = np.cos(x)*(max-x)/maxy = np.roll(x,-1)y[-1] = x[-1]shape = (n_steps,1,1)batch_shape = (1,1,1)x = x.reshape(shape)y = y.reshape(shape)model = Sequential()model.add(LSTM(50, return_sequences=True, stateful=True, batch_input_shape=batch_shape))model.add(LSTM(50, return_sequences=True, stateful=True))model.add(Dense(1))model.compile(loss='mse', optimizer='rmsprop')for i in range(1000): model.reset_states() model.fit(x,y,nb_epoch=1, batch_size=1) p = model.predict(x, batch_size=1) plt.clf() plt.axis([-1,31, -1.1, 1.1]) plt.plot(x[:, 0, 0], '*') plt.plot(y[:,0,0],'o') plt.plot(p[:,0,0],'.') plt.draw() plt.pause(0.001)
正如Keras API中所述 https://keras.io/layers/recurrent/
批次中索引为i的每个样本的最后状态将被用作下一批次中索引为i的样本的初始状态
因此,我使用了batch_size = 1
,并尝试预测衰减的余弦函数中每个时间步的下一个值。预测,或者图片中红色的点应该进入绿色圆圈中,以正确预测脚本。然而,它没有收敛…有什么办法可以让它学习吗?
回答:
问题在于每次epoch都单独调用model.fit
。在这种情况下,optimizer
参数会被重置,这对训练过程有害。另一件事是在预测前也调用reset_states
– 如果没有调用 – 来自fit
的states
将作为预测的起始状态,这也可能有害。最终代码如下:
for epoch in range(1000): model.reset_states() tot_loss = 0 for batch in range(n_steps): batch_loss = model.train_on_batch(x[batch:batch+1], y[batch:batch+1]) tot_loss+=batch_loss print "Loss: " + str(tot_loss/float(n_steps)) model.reset_states() p = model.predict(x, batch_size=1)