在TensorFlow Keras回归网络中拟合训练数据时出现损失值为nan

我试图从一本书中复制一个回归网络,但无论我尝试什么,拟合时损失值总是nan。我已经检查过,这可能是由于以下原因导致的:

  • 输入数据质量差:我的数据是干净的
  • 输入数据未缩放:我尝试使用StandardScaler和MinMaxScaler,但没有效果
  • 输出数据未缩放:我也尝试使用训练集将数据缩放到0到1之间,但新实例会超出这个范围
  • 梯度爆炸:这可能是原因,但即使使用正则化,问题仍然存在
  • 学习率过高:即使将其设置为一个较低的数值也无法解决问题
  • 无界步骤:即使使用裁剪也无法解决问题
  • 误差测量:从均方误差改为平均绝对误差也无效
  • 批次过大:将训练数据减少到前200个条目也无效

损失函数中出现nan的其他可能原因是什么?

编辑:这也发生在互联网上的所有示例模型中

我真的没有主意了。

数据看起来像这样:

X_train[:5]Out[4]: array([[-3.89243447e-01, -6.10268198e-01,  7.23982383e+00,         7.68512713e+00, -9.15360303e-01, -4.34319791e-02,         1.69375104e+00, -2.66593858e-01],       [-1.00512751e+00, -6.10268198e-01,  5.90241386e-02,         6.22319189e-01, -7.82304360e-01, -6.23993472e-02,        -8.17899555e-01,  1.52950349e+00],       [ 5.45617265e-01,  5.78632450e-01, -1.56942033e-01,        -2.49063893e-01, -5.28447626e-01, -3.67342889e-02,        -8.31983577e-01,  7.11281365e-01],       [-1.53276576e-01,  1.84679314e+00, -9.75702024e-02,         3.03921163e-01, -5.96726334e-01, -6.73883756e-02,        -7.14616727e-01,  6.56400612e-01],       [ 1.97163670e+00, -1.56138872e+00,  9.87949430e-01,        -3.36887553e-01, -3.42869600e-01,  5.08919289e-03,        -6.86448683e-01,  3.12148621e-01]])X_valid[:5]Out[5]: array([[ 2.06309546e-01,  1.21271280e+00, -7.86614121e-01,         1.36422365e-01, -6.81637034e-01, -1.12999850e-01,        -8.78930317e-01,  7.21259683e-01],       [ 7.12374210e-01,  1.82332234e-01,  2.24876920e-01,        -2.22866905e-02,  1.51713346e-01, -2.62325989e-02,         8.01762978e-01, -1.20954497e+00],       [ 5.86851369e+00,  2.61592277e-01,  1.86656568e+00,        -9.86220816e-02,  7.11794858e-02, -1.50302387e-02,         9.05045806e-01, -1.38915470e+00],       [-1.81402984e-01, -5.54478959e-02, -6.23050382e-02,         3.15382948e-02, -2.41326907e-01, -4.58773896e-02,        -8.74235643e-01,  7.86118754e-01],       [ 5.02584914e-01, -6.10268198e-01,  8.08807908e-01,         1.22787966e-01, -3.13107087e-01,  4.73927994e-03,         1.14447418e+00, -8.00433903e-01]])y_train[:5]Out[6]: array([[-0.4648844 ],       [-1.26625476],       [-0.11064919],       [ 0.55441007],       [ 1.19863195]])y_valid[:5]Out[7]: array([[ 2.018235  ],       [ 1.25593471],       [ 2.54525539],       [ 0.04215816],       [-0.39716296]])

代码:keras.__version__ 2.4.0

from sklearn.datasets import fetch_california_housingfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScaler, MinMaxScalerfrom tensorflow import kerasimport numpy as nphousing = fetch_california_housing()X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target)X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)scaler = StandardScaler()scaler.fit(X_train)X_train = scaler.transform(X_train)X_valid = scaler.transform(X_valid)X_test = scaler.transform(X_test)print(f'X_train:{X_train.shape}, X_valid: { X_valid.shape}, y_train: {y_train.shape}, y_valid:{y_valid.shape}')print(f'X_test: {X_test.shape}, y_test: {y_test.shape}')assert not np.nan in X_trainassert not np.nan in X_validscalery=StandardScaler()y_train=scalery.fit_transform(y_train.reshape(len(y_train),1))y_valid=scalery.transform(y_valid.reshape(len(y_valid),1))y_test=scalery.transform(y_test.reshape(len(y_test),1))#initializers: relu:he_uniform, tanh:glorotmodel = keras.models.Sequential([                                keras.layers.Dense(30, activation="relu",input_shape=X_train.shape[1:]                                                    , kernel_initializer="he_uniform"                                                    , kernel_regularizer='l1')                                ,keras.layers.Dense(1)                                ])optimizer = keras.optimizers.SGD(lr=0.0001, clipvalue=1)model.compile(loss=keras.losses.MeanSquaredError()              , optimizer=optimizer)history = model.fit(X_train[0:200], y_train[0:200]                    , epochs=5                    ,validation_data=(X_valid[0:20], y_valid[0:20]))

输出:

X_train:(11610, 8), X_valid: (3870, 8), y_train: (11610,), y_valid:(3870,)X_test: (5160, 8), y_test: (5160,)Epoch 1/57/7 [==============================] - 0s 24ms/step - loss: nan - val_loss: nanEpoch 2/57/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nanEpoch 3/57/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nanEpoch 4/57/7 [==============================] - 0s 5ms/step - loss: nan - val_loss: nanEpoch 5/57/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan

有趣的阅读(但没有帮助):


回答:

我找到了自己问题的答案:

事实证明,TensorFlow目前在Python 3.10下无法正常工作。将我的Python版本降级到3.8后,一切开始正常工作。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注