这是一个我实际尝试做的玩具版本。我有非常高维的输入数据(2e05到5e06维度),跨越大量时间步(150,000步)。我明白最后可能需要对状态进行一些嵌入/压缩(参见这个问题)。但现在让我们先把这个问题搁置一旁。
以这个具有11个维度的玩具输入数据为例
t Pattern0 0,0,0,0,0,0,0,0,0,2,1 1 0,0,0,0,0,0,0,0,2,1,0 2 0,0,0,0,0,0,0,2,1,0,0 n ...
我希望RNN能够学习将当前时间步与下一个时间步相关联,这样如果输入(x)是t0,那么期望的输出(y)就是t1。
使用RNN的想法是由于我的真实数据维度很大,我可以一次只向网络提供一个时间步。由于输入和输出的数量相同,我不确定基本的RNN是否合适。我稍微看了一下seq2seq教程,但我并不确定这个应用是否需要编码器/解码器,并且使用我的玩具数据我无法取得任何进展。
以下是我所能想到的所有内容,但它根本无法收敛。我错过了什么?
import numpy as npimport tensorflow as tf# 导入加载CSV文件的模块from tensorflow.python.platform import gfile import csv# 输入序列wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1], [0,0,0,0,0,0,0,0,2,1,0], [0,0,0,0,0,0,0,2,1,0,0], [0,0,0,0,0,0,2,1,0,0,0], [0,0,0,0,0,2,1,0,0,0,0], [0,0,0,0,2,1,0,0,0,0,0], [0,0,0,2,1,0,0,0,0,0,0], [0,0,2,1,0,0,0,0,0,0,0], [0,2,1,0,0,0,0,0,0,0,0], [2,1,0,0,0,0,0,0,0,0,0]]data = np.array(wholeSequence[:-1], dtype=int) # 除最后一个外所有target = np.array(wholeSequence[1:], dtype=int) # 除第一个外所有trainingSet = tf.contrib.learn.datasets.base.Dataset(data=data, target=target)trainingSetDims = trainingSet.data.shape[1]EPOCHS = 10000PRINT_STEP = 1000x_ = tf.placeholder(tf.float32, [None, trainingSetDims])y_ = tf.placeholder(tf.float32, [None, trainingSetDims])cell = tf.nn.rnn_cell.BasicRNNCell(num_units=trainingSetDims)outputs, states = tf.nn.rnn(cell, [x_], dtype=tf.float32)outputs = outputs[-1]W = tf.Variable(tf.random_normal([trainingSetDims, 1])) b = tf.Variable(tf.random_normal([trainingSetDims]))y = tf.matmul(outputs, W) + bcost = tf.reduce_mean(tf.square(y - y_))train_op = tf.train.RMSPropOptimizer(0.005, 0.2).minimize(cost)with tf.Session() as sess: tf.initialize_all_variables().run() for i in range(EPOCHS): sess.run(train_op, feed_dict={x_:trainingSet.data, y_:trainingSet.target}) if i % PRINT_STEP == 0: c = sess.run(cost, feed_dict={x_:trainingSet.data, y_:trainingSet.target}) print('训练成本:', c) response = sess.run(y, feed_dict={x_:trainingSet.data}) print(response)
这个方法来自这个线程。
最终我想使用LSTM,目的是对序列进行建模,以便通过用t0初始化网络,然后将预测反馈作为下一个输入,可以重建整个序列的近似值。
编辑1
自从我添加了以下代码,将直方图输入数据重新缩放为概率分布后,我现在看到成本有所降低:
# 将直方图转换为概率分布wholeSequence = np.array(wholeSequence, dtype=float) # 转换为NP数组.pdfSequence = wholeSequence*(1./np.sum(wholeSequence)) # 归一化为PD.data = pdfSequence[:-1] # 除最后一个外所有target = pdfSequence[1:] # 除第一个外所有
输出仍然看起来不像输入,所以我肯定是错过了什么:
('训练成本:', 0.49993864)('训练成本:', 0.0012213766)('训练成本:', 0.0010471855)('训练成本:', 0.00094231067)('训练成本:', 0.0008385859)('训练成本:', 0.00077578216)('训练成本:', 0.00071381911)('训练成本:', 0.00063783216)('训练成本:', 0.00061271922)('训练成本:', 0.00059178629)[[ 0.02012676 0.02383044 0.02383044 0.02383044 0.02383044 0.02383044 0.02383044 0.02383044 0.02383044 0.01642305 0.01271933] [ 0.02024871 0.02395239 0.02395239 0.02395239 0.02395239 0.02395239 0.02395239 0.02395239 0.02395239 0.016545 0.01284128] [ 0.02013803 0.02384171 0.02384171 0.02384171 0.02384171 0.02384171 0.02384171 0.02384171 0.02384171 0.01643431 0.0127306 ] [ 0.020188 0.02389169 0.02389169 0.02389169 0.02389169 0.02389169 0.02389169 0.02389169 0.02389169 0.01648429 0.01278058] [ 0.02020025 0.02390394 0.02390394 0.02390394 0.02390394 0.02390394 0.02390394 0.02390394 0.02390394 0.01649654 0.01279283] [ 0.02005926 0.02376294 0.02376294 0.02376294 0.02376294 0.02376294 0.02376294 0.02376294 0.02376294 0.01635554 0.01265183] [ 0.02034193 0.02404562 0.02404562 0.02404562 0.02404562 0.02404562 0.02404562 0.02404562 0.02404562 0.01663822 0.01293451] [ 0.02057907 0.02428275 0.02428275 0.02428275 0.02428275 0.02428275 0.02428275 0.02428275 0.02428275 0.01687536 0.01317164] [ 0.02042386 0.02412754 0.02412754 0.02412754 0.02412754 0.02412754 0.02412754 0.02412754 0.02412754 0.01672015 0.01301643]]
回答:
我放弃了直接使用TensorFlow,转而使用了Keras。以下是使用单层LSTM和第二层密集层学习上述玩具序列的代码:
import numpy as npfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTM# 输入序列wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1], [0,0,0,0,0,0,0,0,2,1,0], [0,0,0,0,0,0,0,2,1,0,0], [0,0,0,0,0,0,2,1,0,0,0], [0,0,0,0,0,2,1,0,0,0,0], [0,0,0,0,2,1,0,0,0,0,0], [0,0,0,2,1,0,0,0,0,0,0], [0,0,2,1,0,0,0,0,0,0,0], [0,2,1,0,0,0,0,0,0,0,0], [2,1,0,0,0,0,0,0,0,0,0]]# 预处理数据:(这不起作用)wholeSequence = np.array(wholeSequence, dtype=float) # 转换为NP数组.data = wholeSequence[:-1] # 除最后一个外所有target = wholeSequence[1:] # 除第一个外所有# 重塑训练数据以适应Keras LSTM模型# 训练数据需要是(batchIndex, timeStepIndex, dimentionIndex)# 单批次,9个时间步,11个维度data = data.reshape((1, 9, 11))target = target.reshape((1, 9, 11))# 构建模型model = Sequential() model.add(LSTM(11, input_shape=(9, 11), unroll=True, return_sequences=True))model.add(Dense(11))model.compile(loss='mean_absolute_error', optimizer='adam')model.fit(data, target, nb_epoch=2000, batch_size=1, verbose=2)