我在尝试在Keras中实现一个自定义损失函数,但无法使其正常工作。
我已经在numpy和keras.backend中实现了它:
def log_rmse_np(y_true, y_pred): d_i = np.log(y_pred) - np.log(y_true) loss1 = (np.sum(np.square(d_i))/np.size(d_i)) loss2 = ((np.square(np.sum(d_i)))/(2 * np.square(np.size(d_i)))) loss = loss1 - loss2 print('np_loss = %s - %s = %s'%(loss1, loss2, loss)) return lossdef log_rmse(y_true, y_pred): d_i = (K.log(y_pred) - K.log(y_true)) loss1 = K.mean(K.square(d_i)) loss2 = K.square(K.sum(K.flatten(d_i),axis=-1))/(K.cast_to_floatx(2) * K.square(K.cast_to_floatx(K.int_shape(K.flatten(d_i))[0]))) loss = loss1 - loss2 return loss
当我使用以下函数测试并比较损失时,一切似乎都正常工作。
def check_loss(_shape): if _shape == '2d': shape = (6, 7) elif _shape == '3d': shape = (5, 6, 7) elif _shape == '4d': shape = (8, 5, 6, 7) elif _shape == '5d': shape = (9, 8, 5, 6, 7) y_a = np.random.random(shape) y_b = np.random.random(shape) out1 = K.eval(log_rmse(K.variable(y_a), K.variable(y_b))) out2 = log_rmse_np(y_a, y_b) print('shapes:', str(out1.shape), str(out2.shape)) print('types: ', type(out1), type(out2)) print('log_rmse: ', np.linalg.norm(out1)) print('log_rmse_np: ', np.linalg.norm(out2)) print('difference: ', np.linalg.norm(out1-out2)) assert out1.shape == out2.shape #assert out1.shape == shape[-1]def test_loss(): shape_list = ['2d', '3d', '4d', '5d'] for _shape in shape_list: check_loss(_shape) print ('======================')test_loss()
上述代码打印如下内容:
np_loss = 1.34490449177 - 0.000229461787517 = 1.34467502998shapes: () ()types: <class 'numpy.float32'> <class 'numpy.float64'>log_rmse: 1.34468log_rmse_np: 1.34467502998difference: 3.41081509703e-08======================np_loss = 1.68258448859 - 7.67580654591e-05 = 1.68250773052shapes: () ()types: <class 'numpy.float32'> <class 'numpy.float64'>log_rmse: 1.68251log_rmse_np: 1.68250773052difference: 1.42057615005e-07======================np_loss = 1.99736933814 - 0.00386228512295 = 1.99350705302shapes: () ()types: <class 'numpy.float32'> <class 'numpy.float64'>log_rmse: 1.99351log_rmse_np: 1.99350705302difference: 2.53924863358e-08======================np_loss = 1.95178217182 - 1.60006871892e-05 = 1.95176617114shapes: () ()types: <class 'numpy.float32'> <class 'numpy.float64'>log_rmse: 1.95177log_rmse_np: 1.95176617114difference: 3.78277884572e-08======================
当我使用这个损失函数编译并训练我的模型时,从未出现异常,并且当我使用’adam’优化器运行模型时,一切正常工作。然而,使用这个损失函数时,Keras总是显示NaN损失:
Epoch 1/10000 17/256 [>.............................] - ETA: 124s - loss: nan
我现在有点卡住了… 我做错了什么吗?
我在Ubuntu 16.04上使用Tensorflow 1.4
更新:
根据Marcin Możejko的建议,我更新了代码,但遗憾的是训练损失仍然是NaN:
def get_log_rmse(normalization_constant): def log_rmse(y_true, y_pred): d_i = (K.log(y_pred) - K.log(y_true)) loss1 = K.mean(K.square(d_i)) loss2 = K.square(K.sum(K.flatten(d_i),axis=-1))/K.cast_to_floatx(2 * normalization_constant ** 2) loss = loss1 - loss2 return loss return log_rmse
然后通过以下方式编译模型:
model.compile(optimizer='adam', loss=get_log_rmse(batch_size))
更新2:
模型摘要如下所示:
Layer (type) Output Shape Param # =================================================================input_2 (InputLayer) (None, 160, 256, 3) 0 _________________________________________________________________block1_conv1 (Conv2D) (None, 160, 256, 64) 1792 _________________________________________________________________block1_conv2 (Conv2D) (None, 160, 256, 64) 36928 _________________________________________________________________block1_pool (MaxPooling2D) (None, 80, 128, 64) 0 _________________________________________________________________block2_conv1 (Conv2D) (None, 80, 128, 128) 73856 _________________________________________________________________block2_conv2 (Conv2D) (None, 80, 128, 128) 147584 _________________________________________________________________block2_pool (MaxPooling2D) (None, 40, 64, 128) 0 _________________________________________________________________block3_conv1 (Conv2D) (None, 40, 64, 256) 295168 _________________________________________________________________block3_conv2 (Conv2D) (None, 40, 64, 256) 590080 _________________________________________________________________block3_conv3 (Conv2D) (None, 40, 64, 256) 590080 _________________________________________________________________block3_pool (MaxPooling2D) (None, 20, 32, 256) 0 _________________________________________________________________block4_conv1 (Conv2D) (None, 20, 32, 512) 1180160 _________________________________________________________________block4_conv2 (Conv2D) (None, 20, 32, 512) 2359808 _________________________________________________________________block4_conv3 (Conv2D) (None, 20, 32, 512) 2359808 _________________________________________________________________block4_pool (MaxPooling2D) (None, 10, 16, 512) 0 _________________________________________________________________conv2d_1 (Conv2D) (None, 10, 16, 1) 513 =================================================================Total params: 7,245,777Trainable params: 7,245,777Non-trainable params: 0_________________________________________________________________
更新3:
示例y_true如下:
回答: