我试图使用Keras复制《神经网络与深度学习》书中的一些示例,但我遇到了基于第一章架构训练网络的问题。目的是对MNIST数据集中的手写数字进行分类。网络架构如下:
- 784个输入(对应MNIST图像中28 * 28个像素中的每一个)
- 一个包含30个神经元的隐藏层
- 一个包含10个神经元的输出层
- 权重和偏置从均值为0、标准差为1的高斯分布中初始化
- 损失/成本函数为均方误差
- 优化器为随机梯度下降
超参数设置如下:
- 学习率 = 3.0
- 批量大小 = 10
- 轮次 = 30
我的代码如下:
from keras.datasets import mnistfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.optimizers import SGDfrom keras.initializers import RandomNormal# import data(x_train, y_train), (x_test, y_test) = mnist.load_data()# input image dimensionsimg_rows, img_cols = 28, 28x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)input_shape = (img_rows * img_cols,)x_train = x_train.astype('float32')x_test = x_test.astype('float32')x_train /= 255x_test /= 255print('x_train shape:', x_train.shape)print(x_train.shape[0], 'train samples')print(x_test.shape[0], 'test samples')# convert class vectors to binary class matricesnum_classes = 10y_train = keras.utils.to_categorical(y_train, num_classes)y_test = keras.utils.to_categorical(y_test, num_classes)print('y_train shape:', y_train.shape)# Construct model# 784 * 30 * 10# Normal distribution for weights/biases# Stochastic Gradient Descent optimizer# Mean squared error loss (cost function)model = Sequential()layer1 = Dense(30, input_shape=input_shape, kernel_initializer=RandomNormal(stddev=1), bias_initializer=RandomNormal(stddev=1))model.add(layer1)layer2 = Dense(10, kernel_initializer=RandomNormal(stddev=1), bias_initializer=RandomNormal(stddev=1))model.add(layer2)print('Layer 1 input shape: ', layer1.input_shape)print('Layer 1 output shape: ', layer1.output_shape)print('Layer 2 input shape: ', layer2.input_shape)print('Layer 2 output shape: ', layer2.output_shape)model.summary()model.compile(optimizer=SGD(lr=3.0), loss='mean_squared_error', metrics=['accuracy'])# Train model.fit(x_train, y_train, batch_size=10, epochs=30, verbose=2)# Run on test data and output resultsresult = model.evaluate(x_test, y_test, verbose=1)print('Test loss: ', result[0])print('Test accuracy: ', result[1])
输出(使用Python 3.6和TensorFlow后端):
Using TensorFlow backend.x_train shape: (60000, 784)60000 train samples10000 test samplesy_train shape: (60000, 10)Layer 1 input shape: (None, 784)Layer 1 output shape: (None, 30)Layer 2 input shape: (None, 30)Layer 2 output shape: (None, 10)_________________________________________________________________Layer (type) Output Shape Param # =================================================================dense_1 (Dense) (None, 30) 23550 _________________________________________________________________dense_2 (Dense) (None, 10) 310 =================================================================Total params: 23,860Trainable params: 23,860Non-trainable params: 0_________________________________________________________________Epoch 1/30 - 7s - loss: nan - acc: 0.0987Epoch 2/30 - 7s - loss: nan - acc: 0.0987
(重复30次轮次)
Epoch 30/30 - 6s - loss: nan - acc: 0.098710000/10000 [==============================] - 0s 22us/stepTest loss: nanTest accuracy: 0.098
如您所见,网络完全没有学习,我不确定为什么。从我能看到的情况来看,形状看起来是正确的。我做了什么导致网络无法学习呢?
(顺便提一下,我知道交叉熵损失和softmax输出层会更好;然而,根据链接的书来看,这些似乎不是必需的。书中第一章手动实现的网络成功地学习了;我在继续之前试图复制那个结果。)
回答:
您需要为每个层指定激活函数。每个层应该像这样设置:
layer2 = Dense(10, activation='sigmoid', kernel_initializer=RandomNormal(stddev=1), bias_initializer=RandomNormal(stddev=1))
请注意,这里我指定了激活参数。对于最后一层,您应该使用activation="softmax"
,因为您有多个类别。
另一件需要考虑的事情是,对于分类(与回归相对),使用熵损失会更好。所以您可能需要在model.compile
中将损失值更改为loss='categorical_crossentropy'
。然而,这不是必需的,使用mean_square_error
损失仍然可以得到结果。
如果损失值仍然是nan
,您可以尝试更改SGD
的学习率。
通过仅将第一层的激活函数更改为sigmoid
,第二层的激活函数更改为softmax
,我使用您展示的脚本获得了0.9425
的测试准确率。