如何在Keras中构建嵌入层

我在尝试按照Francois Chollet的书中一个教程,在TensorFlow中构建一个文本分类模型。我首先尝试创建一个嵌入层,但在这一阶段总是出错。

我的逻辑如下:

  • 从文本字符串列表X和整数列表y开始。

  • 对文本数据进行分词、向量化,并填充到最长序列长度。

  • 将每个整数标签转换为独热编码数组。

  • 将数据输入到嵌入层,输入参数包括:
    • input_dim = 唯一标记/词的总数(在我的例子中是1499)
    • output_dim = 嵌入向量的维度大小(从32开始)
    • input_length = 最大序列长度,与序列填充的维度相同(在我的例子中是295)
  • 将嵌入结果传递到具有32个隐藏单元和relu激活函数的密集层
  • 然后传递到具有3个隐藏单元和softmax激活函数的密集层,以预测3个类别

能有人解释一下我哪里做错了么?我以为我已经理解了如何实例化嵌入层,但这是不是我的理解有误?

这是我的代码:

# read in raw datadf = pd.read_csv('text_dataset.csv')samples = df.data.tolist() # list of strings of textlabels = df.sentiment.to_list() # list of integers# tokenize and vectorize text data to prepare for embeddingtokenizer = Tokenizer()tokenizer.fit_on_texts(samples)sequences = tokenizer.texts_to_sequences(samples)word_index = tokenizer.word_indexprint(f'Found {len(word_index)} unique tokens.')# setting variablesvocab_size = len(word_index) # 1499# Input_dim: This is the size of the vocabulary in the text data.input_dim = vocab_size # 1499# This is the size of the vector space in which words will be embedded.output_dim = 32 # recommended by tf# This is the length of input sequencesmax_sequence_length = len(max(sequences, key=len)) # 295# train/test index splice variabletraining_samples = round(len(samples)*.8)# data = pad_sequences(sequences, maxlen=max_sequence_length) # shape (499, 295)# keras automatically pads to maxlen if left without inputdata = pad_sequences(sequences)# preprocess labels into one hot encoded array of 3 classes ([1., 0., 0.])labels = to_categorical(labels, num_classes=3, dtype='float32') # shape (499, 3)# Create test/train data (80% train, 20% test)x_train = data[:training_samples]y_train = labels[:training_samples]x_test = data[training_samples:]y_test = labels[training_samples:]model = Sequential()model.add(Embedding(input_dim, output_dim, input_length=max_sequence_length))model.add(Dense(32, activation='relu'))model.add(Dense(3, activation='softmax'))model.summary()model.compile(optimizer='rmsprop',              loss='categorical_crossentropy',              metrics=['accuracy'])model.fit(x_train,          y_train,          epochs=10,          batch_size=32,          validation_data=(x_test, y_test))

当我运行这段代码时,我得到了以下错误:

Found 1499 unique tokens.Model: "sequential_23"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================embedding_21 (Embedding)     (None, 295, 32)           47968     _________________________________________________________________dense_6 (Dense)              (None, 295, 32)           1056      _________________________________________________________________dense_7 (Dense)              (None, 295, 3)            99        =================================================================Total params: 49,123Trainable params: 49,123Non-trainable params: 0_________________________________________________________________---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-144-f29ef892e38d> in <module>()     51           epochs=10,     52           batch_size=32,---> 53           validation_data=(x_test, y_test))2 frames/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)    129                         ': expected ' + names[i] + ' to have ' +    130                         str(len(shape)) + ' dimensions, but got array '--> 131                         'with shape ' + str(data_shape))    132                 if not check_batch_axis:    133                     data_shape = data_shape[1:]ValueError: Error when checking target: expected dense_7 to have 3 dimensions, but got array with shape (399, 3)

为了排查问题,我一直在注释掉层以查看发生了什么。我发现问题一直持续到第一层,这让我认为我对嵌入层的理解不够。见下文:

model = Sequential()model.add(Embedding(input_dim, output_dim, input_length=max_sequence_length))# model.add(Dense(32, activation='relu'))# model.add(Dense(3, activation='softmax'))model.summary()

这会导致以下结果:

Found 1499 unique tokens.Model: "sequential_24"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================embedding_22 (Embedding)     (None, 295, 32)           47968     =================================================================Total params: 47,968Trainable params: 47,968Non-trainable params: 0_________________________________________________________________---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-150-63d1b96db467> in <module>()     51           epochs=10,     52           batch_size=32,---> 53           validation_data=(x_test, y_test))2 frames/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)    129                         ': expected ' + names[i] + ' to have ' +    130                         str(len(shape)) + ' dimensions, but got array '--> 131                         'with shape ' + str(data_shape))    132                 if not check_batch_axis:    133                     data_shape = data_shape[1:]ValueError: Error when checking target: expected embedding_22 to have 3 dimensions, but got array with shape (399, 3)


回答:

Keras中的Dense层期望接收一个只有2个维度的平面输入,格式为[BATCH_SIZE, N]。对于一个句子的嵌入层输出有3个维度:[BS, SEN_LENGTH, EMBEDDING_SIZE]

解决这个问题有两种方法:

  1. 在第一个Dense层之前展平嵌入层的输出:model.add(Flatten())
  2. 使用卷积层(推荐这种方法):model.add(Conv1D(filters=32, kernel_size=8, activation='relu'))

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注