### 如果输入值大于input_dim，Keras的Embedding层如何工作？

如果输入值大于input_dim，Embedding层如何工作？

为什么Keras不抛出异常？

from keras.models import Sequentialfrom keras.layers import Embeddingmodel = Sequential()model.add(Embedding(1, 2, trainable=True, mask_zero=False))input_array = [5]model.compile("rmsprop", "mse")output_array = model.predict(input_array)outpur_array#array([[[0., 0.]]], dtype=float32)

输入值 = 5，input_dim = 1

文档中说明输入值（5）必须小于input_dim（1）。在我的例子中这是错误的，但代码仍然没有抛出异常

谢谢你！

回答：

Embedding层使用一个形状为(input_dim, output_dim)的查找矩阵，其中input_dim是需要学习的嵌入向量的数量。当我传递索引时，层会从Embedding矩阵中按索引取向量。

感谢指出我之前混淆了input_length和input_dim。

首先，如果你使用tensorflow.keras，会有一个错误。

tensorflow

from tensorflow.keras.models import Modelfrom tensorflow.keras.layers import Embedding, Inputimport numpy as npip = Input(shape = (3,))emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)model = Model(ip, emb)input_array = np.array([[5, 3, 1], [1, 2, 3]])model.compile("rmsprop", "mse")output_array = model.predict(input_array)print(output_array)print(output_array.shape)model.summary()

但是如果我使用keras 2.3.1，我不会得到任何错误。

keras 2.3.1

from keras.models import Modelfrom keras.layers import Embedding, Inputimport numpy as npip = Input(shape = (3,))emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)model = Model(ip, emb)input_array = np.array([[5, 3, 1], [1, 2, 3]])model.compile("rmsprop", "mse")output_array = model.predict(input_array)print(output_array)print(output_array.shape)model.summary()

那么，Keras是不是有问题了？首先要注意的是，Keras和tensorflow.keras对嵌入层的实现是不同的。为了验证这一点，我们来看一下Keras的嵌入层。

https://github.com/keras-team/keras/blob/master/keras/layers/embeddings.py#L16

现在我们只看一下call函数。

    def call(self, inputs):        if K.dtype(inputs) != 'int32':            inputs = K.cast(inputs, 'int32')        out = K.gather(self.embeddings, inputs)        return out

注意：如果你想要Keras 2.3.1的确切源代码，请前往这里并下载源代码：https://github.com/keras-team/keras/releases

但是如果我们去看tensorflow的实现，它是不同的。

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/embedding_ops.py

只是为了验证，call函数是不同地编写的。

  def call(self, inputs):    dtype = K.dtype(inputs)    if dtype != 'int32' and dtype != 'int64':      inputs = math_ops.cast(inputs, 'int32')    out = embedding_ops.embedding_lookup(self.embeddings, inputs)    return out

现在，我们可以深入挖掘以找到不同的行为并确定Keras不抛出错误而tensorflow.keras抛出错误的原因，但让我们简单地指出一点。Keras的嵌入层做错了什么吗？

让我们像之前一样设计一个简单的网络并观察权重矩阵。

from keras.models import Modelfrom keras.layers import Embedding, Inputimport numpy as npip = Input(shape = (3,))emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)model = Model(ip, emb)input_array = np.array([[5, 3, 1], [1, 2, 3]])model.compile("rmsprop", "mse")output_array = model.predict(input_array)print(output_array)print(output_array.shape)model.summary()

模型给出以下输出。

[[[0. 0.]  [0. 0.]  [0. 0.]] [[0. 0.]  [0. 0.]  [0. 0.]]](2, 3, 2)Model: "model_18"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================input_21 (InputLayer)        (None, 3)                 0         _________________________________________________________________embedding_33 (Embedding)     (None, 3, 2)              2         =================================================================Total params: 2Trainable params: 2Non-trainable params: 0

好的，我们得到了一堆零，但默认的weight_initializer并不是零！

那么，让我们现在观察权重矩阵。

import keras.backend as Kw = model.layers[1].get_weights()print(w)

[array([[ 0.03680499, -0.04904002]], dtype=float32)]

事实上，它并不是全零的。

那么，为什么我们得到的是零呢？

让我们更改模型的输入。

因为对于input_dim = 1，唯一在词汇表中的词索引是0。让我们传递0作为输入之一。

from keras.models import Modelfrom keras.layers import Embedding, Inputimport numpy as npip = Input(shape = (3,))emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)model = Model(ip, emb)input_array = np.array([[5, 0, 1], [1, 2, 0]])model.compile("rmsprop", "mse")output_array = model.predict(input_array)print(output_array)print(output_array.shape)model.summary()

现在，我们在传递0的位置得到了非零向量。

[[[ 0.          0.        ]  [-0.04339869 -0.04900574]  [ 0.          0.        ]] [[ 0.          0.        ]  [ 0.          0.        ]  [-0.04339869 -0.04900574]]](2, 3, 2)Model: "model_19"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================input_22 (InputLayer)        (None, 3)                 0         _________________________________________________________________embedding_34 (Embedding)     (None, 3, 2)              2         =================================================================Total params: 2Trainable params: 2Non-trainable params: 0

简而言之，Keras将任何不在词汇表中的词索引映射为零向量，这是合理的，因为对于这些位置，前向传递将确保所有贡献为零（虽然偏置可能有一定的作用）。这有点违反直觉，因为将不在词汇表中的词传递给模型似乎是一种开销（而不是在预处理步骤中删除它们），而且是坏习惯。

教训是完全避免使用Keras，转而使用tensorflow.keras，因为他们明确提到在2.2版本之后将减少支持和仅进行小修复。

在Keras GitHub仓库中相关的issue：https://github.com/keras-team/keras/issues/13989

学技术

### 如果输入值大于input_dim，Keras的Embedding层如何工作？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复