如何在变量作用域中重用LSTM层和变量(注意力机制)

我的代码中有一个问题,我希望在lstm_decoder中共享权重(本质上就是使用一个LSTM)。我知道网上有几个相关资源,但我仍然无法理解为什么以下代码无法共享权重:

initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))for index in range(window_size):    with tf.variable_scope('lstm_cell_decoder', reuse = index > 0):        rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)        output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)        # 计算源输出向量的分数        scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))        attention_coef = tf.nn.softmax(scores)        context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)        context_vector = tf.reshape(context_vector, (1,hidden_size))        # 计算隐藏状态的 tilda 版本 \tilde{h}_t=tanh(W[c_t, h_t]+b_t)        concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)        W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)        b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)        hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde 是 [1*64]        # 更新到下一个时间步        initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))        last_encoder_state = state_decoder        print(initial_input, last_encoder_state)        # 预测目标        W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)        print(W_target)        logit = tf.matmul(hidden_tilde, W_target)        logits = tf.concat([logits, logit], axis = 0)logits = logits[1:]

我想在每次循环迭代中使用同一个LSTM单元和相同的W_target。然而,当window_size = 2时,我在循环中对print(initial_input, last_encoder_state)print(W_target)得到了以下输出。

[<tf.Tensor 'lstm_cell_decoder/unstack:0' shape=(1, 64) dtype=float32>] LSTMStateTuple(c=<tf.Tensor 'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64) dtype=float32>, h=<tf.Tensor 'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64) dtype=float32>)<tf.Variable 'lstm_cell_decoder/weights_target:0' shape=(64, 1) dtype=float32_ref>[<tf.Tensor 'lstm_cell_decoder_1/unstack:0' shape=(1, 64) dtype=float32>] LSTMStateTuple(c=<tf.Tensor 'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64) dtype=float32>, h=<tf.Tensor 'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64) dtype=float32>)<tf.Variable 'lstm_cell_decoder_1/weights_target:0' shape=(64, 1) dtype=float32_ref>

更新:在Maxim的评论后,我尝试了以下语法

for index in range(window_size):  with tf.variable_scope('lstm_cell_decoder', reuse = index > 0):     rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size,reuse=index > 0)     output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, ...)     W_target = tf.get_variable(...)

现在它正确地共享了变量W_target,但共享LSTM单元/权重仍然存在问题:

<tf.Tensor 'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1,  64) dtype=float32>] LSTMStateTuple(c=<tf.Tensor  'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64)  dtype=float32>, h=<tf.Tensor 'lstm_cell_decoder/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64)  dtype=float32>) <tf.Variable 'lstm_cell_decoder/weights_target:0' shape=(64, 1)  dtype=float32_ref> [<tf.Tensor 'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/mul_2:0'  shape=(1, 64) dtype=float32>] LSTMStateTuple(c=<tf.Tensor  'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/add_1:0' shape=(1, 64)  dtype=float32>, h=<tf.Tensor  'lstm_cell_decoder_1/rnn/rnn/lstm_cell/lstm_cell/mul_2:0' shape=(1, 64)  dtype=float32>) <tf.Variable 'lstm_cell_decoder/weights_target:0' shape=(64, 1)  dtype=float32_ref>

回答:

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注