我不知道是不是只有我觉得TensorFlow的文档有点薄弱。
我原本计划使用tf.nn.batch_normalization函数来实现批量归一化,但后来发现了tf.layers.batch_normalization函数,看起来这个函数因为简单性应该更适合使用。但是如果可以说的话,文档确实很差。
我试图理解如何正确使用它,但网页上提供的信息实在是不容易理解。我希望也许其他人有经验,可以帮助我(也可能帮助很多其他人)理解它。
让我先分享一下接口:
tf.layers.batch_normalization( inputs, axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer=tf.zeros_initializer(), gamma_initializer=tf.ones_initializer(), moving_mean_initializer=tf.zeros_initializer(), moving_variance_initializer=tf.ones_initializer(), beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, training=False, trainable=True, name=None, reuse=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None, virtual_batch_size=None, adjustment=None)
Q1) beta值被初始化为零,gamma值被初始化为1。但它没有说明为什么。当使用批量归一化时,我理解神经网络的普通偏置参数变得过时,而批量归一化步骤中的beta参数在某种程度上起到同样的作用。从这个角度来看,将beta设为零是可以理解的。但为什么gamma值被初始化为1?这真的是最有效的方式吗?
Q2) 我也看到了momentum参数。文档只是说“移动平均的动量”。我假设这个参数在计算相应隐藏层中某个小批量的“均值”时使用。换句话说,批量归一化中使用的均值不是当前小批量的均值,而是主要是过去100个小批量的均值(因为momentum = 0.99)。但在测试时,这个参数如何影响执行,或者如果我只是在开发集上验证我的模型通过计算成本和准确性,我的假设是每当我处理测试和开发集时,我将“training”参数设置为False,这样动量参数对于该特定执行就变得无关紧要,并且在训练期间计算的“均值”和“方差”值现在被使用,而不是计算新的均值和方差值。如果我错了,这就是它应该的方式,但我没有在文档中看到任何关于这是不是这样的内容。有人能确认我的理解是正确的吗?如果不是,我会非常感谢进一步的解释。
Q3) 我很难理解trainable参数的含义。我假设这里指的是beta和gamma参数。为什么它们不可以被训练?
Q4) “reuse”参数。这到底是什么?
Q5) adjustment参数。另一个谜团…
Q5) 一种总结问题…这是我需要确认和反馈的总体假设…这里重要的参数是:- inputs- axis- momentum- center- scale- training我假设只要在训练时training=True,我们就安全了。只要在验证开发集或测试集时,或者在现实生活中使用模型时training=False,我们也安全。
任何反馈都将非常受欢迎。
补充说明:
困惑继续。帮助!
我试图使用这个函数而不是手动实现一个批量归一化器。我有以下前向传播函数,它循环遍历神经网络的层。
def forward_propagation_with_relu(X, num_units_in_layers, parameters, normalize_batch, training, mb_size=7): L = len(num_units_in_layers) A_temp = tf.transpose(X) for i in range (1, L): W = parameters.get("W"+str(i)) b = parameters.get("b"+str(i)) Z_temp = tf.add(tf.matmul(W, A_temp), b) if normalize_batch: if (i < (L-1)): with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE): Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1, training=training) A_temp = tf.nn.relu(Z_temp) return Z_temp #这是最后一层的线性输出
tf.layers.batch_normalization(..)函数需要有静态维度,但我没有静态维度的情况。
因为我应用的是小批量而不是每次在运行优化器之前训练整个训练集,所以X的一个维度似乎是未知的。
如果我写:
print(X.shape)
我得到:
(?, 5)
当这种情况发生时,当我运行整个程序时,我得到了以下错误。
我在其他一些线程中看到有人说他们可以通过使用tf.reshape函数来解决这个问题。我试过了…前向传播运行得很好,但后来在Adam优化器中崩溃了…
当我运行上面的代码(不使用tf.reshape)时,我得到了以下内容:
我该如何解决这个问题???
---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-191-990fb7d7f7f6> in <module>() 24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, 25 normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,---> 26 lambd, print_progress) 27 28 print(parameters)<ipython-input-190-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress) 34 # 前向传播:在tensorflow图中构建前向传播 35 ZL = forward_propagation_with_relu(X_mini_batch, num_units_in_layers, ---> 36 parameters, normalize_batch, training) 37 38 with tf.name_scope("calc_cost"):<ipython-input-187-8012e2fb6236> in forward_propagation_with_relu(X, num_units_in_layers, parameters, normalize_batch, training, mb_size) 15 with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE): 16 Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1, ---> 17 training=training) 18 19 A_temp = tf.nn.relu(Z_temp)~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in batch_normalization(inputs, axis, momentum, epsilon, center, scale, beta_initializer, gamma_initializer, moving_mean_initializer, moving_variance_initializer, beta_regularizer, gamma_regularizer, beta_constraint, gamma_constraint, training, trainable, name, reuse, renorm, renorm_clipping, renorm_momentum, fused, virtual_batch_size, adjustment) 775 _reuse=reuse, 776 _scope=name)--> 777 return layer.apply(inputs, training=training) 778 779 ~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in apply(self, inputs, *args, **kwargs) 805 Output tensor(s). 806 """--> 807 return self.__call__(inputs, *args, **kwargs) 808 809 def _add_inbound_node(self,~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in __call__(self, inputs, *args, **kwargs) 676 self._defer_regularizers = True 677 with ops.init_scope():--> 678 self.build(input_shapes) 679 # Create any regularizers added by `build`. 680 self._maybe_create_variable_regularizers()~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in build(self, input_shape) 251 if axis_to_dim[x] is None: 252 raise ValueError('Input has undefined `axis` dimension. Input shape: ',--> 253 input_shape) 254 self.input_spec = base.InputSpec(ndim=ndims, axes=axis_to_dim) 255 ValueError: ('Input has undefined `axis` dimension. Input shape: ', TensorShape([Dimension(6), Dimension(None)]))
这太无望了…
补充说明(2)
我添加更多信息:
以下简单意味着输入层有5个单元,每个隐藏层有6个单元,输出层有2个单元。
num_units_in_layers = [5,6,6,2]
这是使用tf.reshape的更新版本的前向传播函数
def forward_propagation_with_relu(X, num_units_in_layers, parameters, normalize_batch, training, mb_size=7): L = len(num_units_in_layers) print("X.shape before reshape: ", X.shape) # 新增行1 X = tf.reshape(X, [mb_size, num_units_in_layers[0]]) # 新增行2 print("X.shape after reshape: ", X.shape) # 新增行3 A_temp = tf.transpose(X) for i in range (1, L): W = parameters.get("W"+str(i)) b = parameters.get("b"+str(i)) Z_temp = tf.add(tf.matmul(W, A_temp), b) if normalize_batch: if (i < (L-1)): with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE): Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1, training=training) A_temp = tf.nn.relu(Z_temp) return Z_temp #这是最后一层的线性输出
当我这样做时,我可以运行前向传播函数。但它似乎在后续执行中崩溃了。这是我得到的错误。(请注意,我在前向传播函数中打印了重塑前后的输入X的形状)。
X.shape before reshape: (?, 5)X.shape after reshape: (7, 5)---------------------------------------------------------------------------InvalidArgumentError Traceback (most recent call last)~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args) 1349 try:-> 1350 return fn(*args) 1351 except errors.OpError as e:~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata) 1328 feed_dict, fetch_list, target_list,-> 1329 status, run_metadata) 1330 ~/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg) 515 compat.as_text(c_api.TF_Message(self.status.status)),--> 516 c_api.TF_GetCode(self.status.status)) 517 # Delete the underlying status object from memory otherwise it stays aliveInvalidArgumentError: Incompatible shapes: [7] vs. [2] [[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]During handling of the above exception, another exception occurred:InvalidArgumentError Traceback (most recent call last)<ipython-input-222-990fb7d7f7f6> in <module>() 24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, 25 normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,---> 26 lambd, print_progress) 27 28 print(parameters)<ipython-input-221-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress) 88 cost_mini_batch, 89 accuracy_mini_batch],---> 90 feed_dict={training: True}) 91 nr_of_minibatches += 1 92 sum_minibatch_costs += minibatch_cost~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata) 893 try: 894 result = self._run(None, fetches, feed_dict, options_ptr,--> 895 run_metadata_ptr) 896 if run_metadata: 897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata) 1126 if final_fetches or final_targets or (handle and feed_dict_tensor): 1127 results = self._do_run(handle, final_targets, final_fetches,-> 1128 feed_dict_tensor, options, run_metadata) 1129 else: 1130 results = []~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata) 1342 if handle is None: 1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,-> 1344 options, run_metadata) 1345 else: 1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args) 1361 except KeyError: 1362 pass-> 1363 raise type(e)(node_def, op, message) 1364 1365 def _extend_graph(self):InvalidArgumentError: Incompatible shapes: [7] vs. [2] [[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]Caused by op 'forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub', defined at: File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module> app.launch_new_instance() File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 478, in start self.io_loop.start() File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events self._handle_recv() File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell handler(stream, idents, msg) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 208, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 537, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes if self.run_code(code, result): File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-222-990fb7d7f7f6>", line 26, in <module> lambd, print_progress) File "<ipython-input-221-59594e979129>", line 36, in nn_model parameters, normalize_batch, training) File "<ipython-input-218-62e4c6126c2c>", line 19, in forward_propagation_with_relu training=training) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 777, in batch_normalization return layer.apply(inputs, training=training) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 807, in apply return self.__call__(inputs, *args, **kwargs) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 697, in __call__ outputs = self.call(inputs, *args, **kwargs) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 602, in call lambda: self.moving_mean) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/utils.py", line 211, in smart_cond return control_flow_ops.cond(pred, true_fn=fn1, false_fn=fn2, name=name) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func return func(*args, **kwargs) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1985, in cond orig_res_t, res_t = context_t.BuildCondBranch(true_fn) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1839, in BuildCondBranch original_result = fn() File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 601, in <lambda> lambda: _do_update(self.moving_mean, new_mean), File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 597, in _do_update var, value, self.momentum, zero_debias=False) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/training/moving_averages.py", line 87, in assign_moving_average update_delta = (variable - value) * decay File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 778, in _run_op return getattr(ops.Tensor, operator)(a._AsTensor(), *args) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 934, in binary_op_wrapper return func(x, y, name=name) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4819, in _sub "Sub", x=x, y=y, name=name) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3267, in create_op op_def=op_def) File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessInvalidArgumentError (see above for traceback): Incompatible shapes: [7] vs. [2] [[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
关于为什么X的形状不是静态的这个问题…我不知道…这是我如何设置数据集的。
with tf.name_scope("next_train_batch"): filenames = tf.placeholder(tf.string, shape=[None]) dataset = tf.data.Dataset.from_tensor_slices(filenames) dataset = dataset.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(1).map(decode_csv)) dataset = dataset.shuffle(buffer_size=1000) dataset = dataset.batch(minibatch_size) iterator = dataset.make_initializable_iterator() X_mini_batch, Y_mini_batch = iterator.get_next()
我有2个包含训练数据的csv文件。
train_path1 = "train1.csv"train_path2 = "train2.csv"train_input_paths = [train_path1, train_path2]
我使用可初始化迭代器如下:
sess.run(iterator.initializer, feed_dict={filenames: train_input_paths})
在训练过程中,我不断从训练集中获取小批量。关闭批量归一化时一切正常。如果我启用批量归一化,它需要输入X(小批量)的静态形状。我重塑它,但这次它在执行过程中崩溃了,如上所见。
补充说明(3)
我想我弄清楚了它在哪里崩溃。它可能是在计算成本后运行优化器时崩溃的。
首先是命令序列:首先前向传播,然后计算成本,然后运行优化器。前两个似乎工作正常,但优化器不行。
这是我如何定义优化器的:
with tf.name_scope("train"): update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): # 反向传播:定义tensorflow优化器。使用AdamOptimizer。 optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost_mini_batch)
我在这里有update_ops,以便能够更新移动平均。如果我正确解释了,它只是在尝试更新移动平均时崩溃的。我也可能误解了错误消息…
补充说明(4)
我尝试根据已知维度进行归一化,成功了!但这不是我想归一化的维度,这现在很混乱。让我详细说明:
输入层单元数:5第1层(第一个隐藏层)单元数:6所以weight1是(6, 5)矩阵假设小批量大小为7。我的A[0](或X_mini_batch)的形状是:(7, 5),其中7是小批量中的训练样本数,5是输入层中的单元数。
在计算Z[1]时…Z[1] = weight1 * A[0].transpose… 那么Z[1]的形状是(6, 7)矩阵,其中每一列为每个训练样本提供6个特征。
那么问题是我们想要在Z[1]中归一化哪一列?对我来说有意义的是,你归一化所有给定训练样本的每个特征。这意味着我需要归一化每一行,因为我在每一行中有不同训练样本的不同特征值。并且由于Z[1]的形状是(6, 7),如果我设置axis=0,它应该指的是每一行的归一化。而且7是我情况下的未知数,所以这没问题。基于这个逻辑,它有效了!但我完全困惑于axis=0是否真的在这里指的是每一行…让我展示另一个关于这个轴问题的例子,这个问题已经困扰我很久了…
与此主题无关的代码示例:
cc = tf.constant([[1.,2.,3.], [4.,5.,6.]])with tf.Session() as sess: print(sess.run(tf.reduce_mean(cc, axis=0))) print(sess.run(tf.reduce_mean(cc, axis=1)))
这给出了以下输出:
[2.5 3.5 4.5][2. 5.]
当我将axis设置为0时,它给出了每列的平均值。如果axis=1,它给出了每行的平均值。
(请注意cc.shape给出(2,3))
现在是一百万美元的问题:在二维矩阵中,当我想处理每一行时,axis是0还是1?
补充说明(5)我想我现在正确理解了。让我在这里总结我的轴理解。希望我现在正确理解了…
这是Z[1]矩阵表示,形状为(6,7):
t_ex : 训练样本f: 特征
t_ex1 t_ex2 t_ex3 t_ex4 t_ex5 t_ex6 t_ex7 f1 f1 f1 f1 f1 f1 f1 f2 f2 f2 f2 f2 f2 f2 f3 f3 f3 f3 f3 f3 f3 f4 f4 f4 f4 f4 f4 f4 f5 f5 f5 f5 f5 f5 f5 f6 f6 f6 f6 f6 f6 f6
在上面的小批量中,有7个训练样本,每个训练样本有6个特征(因为第1层有6个单元)。当我们说“tf.layers.batch_normalization(..,axis=0)”时,我们的意思是必须对每一行中的每个特征进行归一化,以消除例如第一行中f1值之间的高方差。
换句话说,我们不会将f1,f2,f3,f4,f5,f6相互归一化。我们将f1相互归一化,f2相互归一化,依此类推…
回答:
Q1) 将gamma初始化为1,beta初始化为0意味着直接使用归一化的输入。由于没有关于层输出方差应该是什么的先验信息,假设标准高斯是公平的。
Q2) 在训练阶段(training=True
),小批量使用它们自己的均值和方差进行归一化,假设训练数据是随机采样的。在测试阶段(training=False
),由于测试数据可能是任意采样的,我们不能使用它们的均值和方差。因此,我们使用,如你所说,来自最后“100”次训练迭代的移动平均估计值。
Q3) 是的,trainable指的是beta
和gamma
。有设置trainable=False
的情况,例如,如果使用新方法更新参数,或者如果批量归一化层是预训练的并且需要冻结。
Q4) 你可能还注意到其他tf.layers
函数中也有reuse
参数。一般来说,如果你想多次调用一个层(例如训练和验证),并且你不希望TensorFlow认为你在创建一个新层,你可以设置reuse=True
。我更喜欢使用with tf.variable_scope(..., reuse=tf.AUTO_REUSE):
来达到同样的目的。
Q5) 我对这个不确定。我猜这是为想要设计新技巧来调整尺度和偏置的用户准备的。
Q6) 是的,你是对的。