无法从DenseVariational获得合理的结果

我正在尝试使用以下大小为500的正弦曲线数据集进行回归问题

首先，我尝试了两个各有10个单元的密集层

model = tf.keras.Sequential([        tf.keras.layers.Dense(10, activation='tanh'),        tf.keras.layers.Dense(10, activation='tanh'),        tf.keras.layers.Dense(1),        tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1.))    ])

使用负对数似然损失进行训练，如下所示

model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=neg_log_likelihood)model.fit(x, y, epochs=50)

得到的图表

接下来，我尝试了类似的环境，使用DenseVariational

model = tf.keras.Sequential([        tfp.layers.DenseVariational(            10, activation='tanh', make_posterior_fn=posterior,            make_prior_fn=prior, kl_weight=1/N, kl_use_exact=True),        tfp.layers.DenseVariational(            10, activation='tanh', make_posterior_fn=posterior,            make_prior_fn=prior, kl_weight=1/N, kl_use_exact=True),        tfp.layers.DenseVariational(            1, activation='tanh', make_posterior_fn=posterior,            make_prior_fn=prior, kl_weight=1/N, kl_use_exact=True),        tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1.))    ])

由于参数数量大约翻倍，我尝试增加数据集大小和/或训练轮数，最多增加到100倍，但没有成功。结果通常如下所示。

我的问题是如何使用DenseVariational获得与Dense层相当的结果？我还读到它可能对初始值敏感。这里是完整代码的链接。欢迎任何建议。

回答：

你需要定义一个不同的代理后验分布。在Tensorflow的贝叶斯线性回归示例中https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Probabilistic_Layers_Regression.ipynb#scrollTo=VwzbWw3_CQ2z

你有如下后验均值场

# Specify the surrogate posterior over `keras.layers.Dense` `kernel` and `bias`.def posterior_mean_field(kernel_size, bias_size=0, dtype=None):  n = kernel_size + bias_size  c = np.log(np.expm1(1.))  return tf.keras.Sequential([      tfp.layers.VariableLayer(2 * n, dtype=dtype),      tfp.layers.DistributionLambda(lambda t: tfd.Independent(          tfd.Normal(loc=t[..., :n],                     scale=1e-5 + 0.01*tf.nn.softplus(c + t[..., n:])),          reinterpreted_batch_ndims=1)),  ])

但请注意，我在Softplus前面加了0.01，减小了标准差的大小。尝试一下这个方法。

比这更好的方法是使用类似于DenseFlipout中默认使用的采样初始化https://www.tensorflow.org/probability/api_docs/python/tfp/layers/DenseFlipout?version=nightly

这里是相同的初始化器，但适用于DenseVariational:

def random_gaussian_initializer(shape, dtype):    n = int(shape / 2)    loc_norm = tf.random_normal_initializer(mean=0., stddev=0.1)    loc = tf.Variable(        initial_value=loc_norm(shape=(n,), dtype=dtype)    )    scale_norm = tf.random_normal_initializer(mean=-3., stddev=0.1)    scale = tf.Variable(        initial_value=scale_norm(shape=(n,), dtype=dtype)    )    return tf.concat([loc, scale], 0)

现在你可以简单地将后验均值场中的VariableLayer更改为

tfp.layers.VariableLayer(2 * n, dtype=dtype, initializer=lambda shape, dtype: random_gaussian_initializer(shape, dtype), trainable=True)

你现在从均值为-3和标准差为0.1的正态分布中采样，以输入你的Softplus。使用后验均值场的均值，我们有scale=Softplus(-3) = 0,048587352，所以它非常小。通过采样，我们将所有尺度初始化为不同但围绕该均值的值。

学技术

无法从DenseVariational获得合理的结果

发表回复取消回复

相关文章：

Related Posts

如何对SVC进行超参数调优？

如何在初始训练后向模型添加训练数据？

使用Google Cloud Function并行运行带有不同用户参数的相同训练作业

加载Keras模型，TypeError: ‘module’ object is not callable

在计算KNN填补方法中特定列中NaN值的”距离平均值”时

使用巨大的S3 CSV文件或直接从预处理的关系型或NoSQL数据库获取数据的机器学习训练/测试工作

发表回复 取消回复

发表回复取消回复