在训练模型后,我尝试进行预测,但出现了错误,我不知道如何修复它。
该模型是使用electra构建的。
这是我的模型
electra = TFElectraModel.from_pretrained("monologg/koelectra-base-v3-discriminator", from_pt=True)input_ids = tf.keras.Input(shape=(MAX_LEN,), name='input_ids', dtype=tf.int32)mask = tf.keras.Input(shape=(MAX_LEN,), name='attention_mask', dtype=tf.int32)token = tf.keras.Input(shape=(MAX_LEN,), name='token_type_ids', dtype=tf.int32)embeddings = electra(input_ids, attention_mask = mask, token_type_ids= token)[0]X = tf.keras.layers.GlobalMaxPool1D()(embeddings)X = tf.keras.layers.BatchNormalization()(X)X = tf.keras.layers.Dense(128, activation='relu')(X)X = tf.keras.layers.Dropout(0.1)(X)y = tf.keras.layers.Dense(3, activation='softmax', name='outputs')(X)model = tf.keras.Model(inputs=[input_ids, mask, token], outputs=y)model.layers[2].trainable=Falsemodel.summary()
这是摘要
__________________________________________________________________________________________________Layer (type) Output Shape Param # Connected to ==================================================================================================input_ids (InputLayer) [(None, 25)] 0 __________________________________________________________________________________________________attention_mask (InputLayer) [(None, 25)] 0 __________________________________________________________________________________________________token_type_ids (InputLayer) [(None, 25)] 0 __________________________________________________________________________________________________tf_electra_model_4 (TFElectraMo TFBaseModelOutput(la 112330752 input_ids[0][0] attention_mask[0][0] token_type_ids[0][0] __________________________________________________________________________________________________global_max_pooling1d_6 (GlobalM (None, 768) 0 tf_electra_model_4[3][0] __________________________________________________________________________________________________batch_normalization_7 (BatchNor (None, 768) 3072 global_max_pooling1d_6[0][0] __________________________________________________________________________________________________dense_18 (Dense) (None, 128) 98432 batch_normalization_7[0][0] __________________________________________________________________________________________________dropout_390 (Dropout) (None, 128) 0 dense_18[0][0] __________________________________________________________________________________________________outputs (Dense) (None, 3) 387 dropout_390[0][0] ==================================================================================================Total params: 112,432,643Trainable params: 112,431,107Non-trainable params: 1,536__________________________________________________________________________________________________
这是创建训练数据集的代码。
input_ids = []attention_masks = []token_type_ids = []train_data_labels = []for train_sent, train_label in tqdm(zip(train_data["content"], train_data["label"]), total=len(train_data)): try: input_id, attention_mask, token_type_id = Electra_tokenizer(train_sent, MAX_LEN) input_ids.append(input_id) attention_masks.append(attention_mask) token_type_ids.append(token_type_id) train_data_labels.append(train_label) except Exception as e: print(e) print(train_sent) passtrain_input_ids = np.array(input_ids, dtype=int)train_attention_masks = np.array(attention_masks, dtype=int)train_type_ids = np.array(token_type_ids, dtype=int)intent_train_inputs = (train_input_ids, train_attention_masks, train_type_ids)intent_train_data_labels = np.asarray(train_data_labels, dtype=np.int32)
这是训练数据集的形状
tf.Tensor([ 3 75 25], shape=(3,), dtype=int32)
使用这个训练数据,模型训练正常,但在执行以下代码进行预测时会出现错误。
sample_text = 'this is sample text'input_id, attention_mask, token_type_id = Electra_tokenizer(sample_text, MAX_LEN)sample_text = (input_id, attention_mask, token_type_id)model(sample_text) #or model.predict(sample_text)
这是错误信息
Layer model_15 expects 3 input(s), but it received 75 input tensors. Inputs received: [<tf.Tensor: shape=(), dtype=int32, numpy=2>, <tf.Tensor: ....
它与训练时的形状相同,但为什么会出现错误?我请求帮助来修复它。
希望你来年有一个美好的年份。新年快乐。
回答:
这是一个张量维度问题。
test_input_ids = np.array(test_input_ids, dtype=np.int32)test_attention_mask = np.array(test_attention_mask, dtype=np.int32)test_token_type_id = np.array(test_token_type_id, dtype=np.int32)ids = np.expand_dims(test_input_ids, axis=0)atm = np.expand_dims(test_attention_mask, axis=0)tok = np.expand_dims(test_token_type_id, axis=0)model(ids,atm.tok) works fine