问题描述

我的输入是x，它们是指示变量，输出是y，其中每一行是一个依赖于x值的随机独热向量（下面展示了数据样本）。

我想训练一个模型，该模型本质上学习x和y之间的概率关系，形式为每列的权重。模型必须“选择”一个，且仅一个指示变量作为输出。我当前的方法是抽样一个分类随机变量并生成一个独热向量作为预测。

问题是我在尝试训练我的Keras模型时遇到了错误ValueError: An operation has `None` for gradient。

我觉得这个错误很奇怪，因为我之前使用Keras和Tensorflow训练过混合网络，它们使用tf.contrib.distributions.Categorical，我没有遇到任何与梯度相关的问题。

代码

实验

import tensorflow as tfimport tensorflow.contrib.distributions as tfdimport numpy as npfrom keras import backend as Kfrom keras.layers import Layerfrom keras.models import Sequentialfrom keras.utils import to_categoricaldef make_xy_prob(rng, size=10000):    rng = np.random.RandomState(rng) if isinstance(rng, int) else rng    cols = 3    weights = np.array([[1, 2, 3]])    # 生成数据并暂时删除零值    x = rng.choice(2, (size, cols))    is_zeros = x.sum(axis=1) == 0    x = x[~is_zeros]    # 使用权重创建概率以确定y    weighted_x = x * weights    prob_x = weighted_x / weighted_x.sum(axis=1, keepdims=True)    y = np.row_stack([to_categorical(rng.choice(cols, p=p), cols) for p in prob_x])    # 重新添加零值并打乱顺序    zeros = np.zeros(((size - len(x), cols)))    x = np.row_stack([x, zeros])    y = np.row_stack([y, zeros])    shuffle_idx = rng.permutation(size)    x = x[shuffle_idx]    y = y[shuffle_idx]    return x, yclass OneHotGate(Layer):    def build(self, input_shape):        self.kernel = self.add_weight(name='kernel', shape=(1, input_shape[1]), initializer='ones')    def call(self, x):        zero_cond = x < 1        x_shape = tf.shape(x)        # 加权指示变量，以便为更可能的列分配更多概率        weighted_x = x * self.kernel        # 用-inf填充零值，以便为该列分配零概率        ninf_fill = tf.fill(x_shape, -np.inf)        masked_x = tf.where(zero_cond, ninf_fill, weighted_x)        onehot_gate = tf.squeeze(tfd.OneHotCategorical(logits=masked_x, dtype=x.dtype).sample(1))        # 在输入原本为零的地方用零填充门        zeros_fill = tf.fill(x_shape, 0.0)        masked_gate = tf.where(zero_cond, zeros_fill, onehot_gate)        return masked_gatedef experiment(epochs=10):    K.clear_session()    rng = np.random.RandomState(2)    X, y = make_xy_prob(rng)    input_shape = (X.shape[1], )    model = Sequential()    gate_layer = OneHotGate(input_shape=input_shape)    model.add(gate_layer)    model.compile('adam', 'categorical_crossentropy')    model.fit(X, y, 64, epochs, verbose=1)

数据样本

>>> x array([[1., 1., 1.],       [0., 1., 0.],       [1., 0., 1.],       ...,       [1., 1., 1.],       [1., 1., 1.],       [1., 1., 0.]])>>> yarray([[0., 0., 1.],       [0., 1., 0.],       [1., 0., 0.],       ...,       [0., 0., 1.],       [1., 0., 0.],       [1., 0., 0.]])

错误

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

回答：

问题在于OneHotCategorical执行的是不连续的抽样，这导致梯度计算失败。为了用连续的（松弛的）版本替换这种不连续的抽样，可以尝试使用RelaxedOneHotCategorical（它基于有趣的Gumbel Softmax技术）。

学技术

使用Keras/Tensorflow输出OneHotCategorical，但操作的梯度为None

问题描述

代码

实验

数据样本

错误

发表回复取消回复

问题描述

代码

实验

数据样本

错误

相关文章：

Related Posts

在使用k近邻算法时，有没有办法获取被使用的“邻居”？

Theano在Google Colab上无法启用GPU支持

准确性评分似乎有误

Keras Functional API: “错误检查输入时：期望input_1具有4个维度，但得到形状为(X, Y)的数组”

如何使用sklearn.datasets.make_classification在指定范围内生成合成数据？

如何处理预测时不在训练集中的标签

发表回复 取消回复

发表回复取消回复