使用Tensorflow解决XOR问题在500个周期后预测不正确

我试图使用TensorFlow实现一个神经网络来解决XOR问题。我选择了sigmoid作为激活函数，网络形状为(2, 2, 1)，并且使用了optimizer=SGD()。我选择batch_size=1，因为问题的总体规模是4，所以非常小。问题是预测结果与正确答案相差甚远。我做错了什么？

我在Google Colab上进行这个操作，TensorFlow版本是2.3.0。

import tensorflow as tfimport numpy as npx = np.array([[0, 0],              [1, 1],              [1, 0],              [0, 1]],  dtype=np.float32)y = np.array([[0],               [0],               [1],               [1]],     dtype=np.float32)model =  tf.keras.models.Sequential()model.add(tf.keras.Input(shape=(2,)))model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))model.compile(optimizer=tf.keras.optimizers.SGD(),               loss=tf.keras.losses.MeanSquaredError(),               metrics=['binary_accuracy'])history = model.fit(x, y, batch_size=1, epochs=500, verbose=False)print("Tensorflow version: ", tf.__version__)predictions = model.predict_on_batch(x)print(predictions)

输出结果：

Tensorflow version:  2.3.0WARNING:tensorflow:10 out of the last 10 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f69f7a83a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.[[0.5090364 ][0.4890102 ][0.50011414][0.49678832]]

回答：

问题在于你的学习率和优化权重的方式。

在训练时需要注意的另一个因素是我们沿着梯度方向迈出的步长。如果步长过大，我们可能会跳到错误的位置，超出局部最小值。如果步长过小，我们可能永远无法达到最小值。

在keras中，默认的随机梯度下降（SGD）学习率为0.01，且在训练过程中这个学习率是固定的。如果你检查你的训练过程，会发现损失函数向全局最小值移动得太慢，或者有时跳到更高的值。对于你的特定问题，使用固定学习率很难达到最小值，因为你没有考虑损失函数的景观。

例如，使用Adam作为优化算法，并设置learning_rate = 0.02，我能够达到100%的准确率。

import tensorflow as tfimport numpy as npx = np.array([[0, 0],              [1, 1],              [1, 0],              [0, 1]],  dtype=np.float32)y = np.array([[0],               [0],               [1],               [1]],     dtype=np.float32)model =  tf.keras.models.Sequential()model.add(tf.keras.Input(shape=(2,)))model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), # 学习率在之前的修改中为0.001              loss=tf.keras.losses.MeanSquaredError(),               metrics=['mse', 'binary_accuracy'])model.summary()print("Tensorflow version: ", tf.__version__)predictions = model.predict_on_batch(x)print(predictions)history = model.fit(x, y, batch_size=1, epochs=500)[[0.05162644][0.06670767][0.9240402 ][0.923379  ]]

我使用Adam是因为它具有自适应学习率，该学习率在训练过程中根据训练情况进行调整。

如果你使用更大的学习率（0.1），但使用SGD，在历史训练损失中你可以看到在某一时刻准确率达到1，但紧接着又跳到较低的值。这是因为你使用的是固定学习率。另一种策略是在使用SGD时达到该值后停止训练，可能需要使用keras的callback。

别忘了调整你的学习率并选择合适的优化器。这对于获得快速训练和良好的最小值至关重要。

还可以考虑改变网络架构（增加节点，并为隐藏层使用其他激活函数，如Relu）。

这里有一些关于如何处理学习率的有用细节

学技术

使用Tensorflow解决XOR问题在500个周期后预测不正确

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复