如何使用预训练的BERT模型进行下一句标注？

我对AI和NLP是新手。我想了解BERT是如何工作的。我使用了BERT预训练模型：https://github.com/google-research/bert

我运行了extract_features.py示例，如readme.md中提到的提取特征段落所述。我得到了向量作为输出。

大家，如何将我在extract_features.py中得到的结果转换为下一句/非下一句标签呢？

我想运行BERT来检查两个句子是否相关，并查看结果。

谢谢！

回答：

答案是使用用于下一句训练的权重，以及从中得到的logits。因此，要使用BERT进行下一句预测，请以训练时使用的格式输入两个句子：

def convert_single_example(ex_index, example, label_list, max_seq_length,                           tokenizer):    """将单个`InputExample`转换为单个`InputFeatures`。"""    label_map = {}    for (i, label) in enumerate(label_list):        label_map[label] = i    tokens_a = tokenizer.tokenize(example.text_a)    tokens_b = None    if example.text_b:        tokens_b = tokenizer.tokenize(example.text_b)    if tokens_b:        # 修改`tokens_a`和`tokens_b`，使总长度小于指定长度。        # 考虑到[CLS], [SEP], [SEP]，使用"- 3"        _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)    else:        # 考虑到[CLS]和[SEP]，使用"- 2"        if len(tokens_a) > max_seq_length - 2:            tokens_a = tokens_a[0:(max_seq_length - 2)]    # BERT的惯例是：    # (a) 对于序列对：    #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]    #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1    # (b) 对于单个序列：    #  tokens:   [CLS] the dog is hairy . [SEP]    #  type_ids: 0     0   0   0  0     0 0    #    # 其中"type_ids"用于指示这是第一个序列还是第二个序列。`type=0`和`type=1`的嵌入向量在预训练期间学习，并添加到词片嵌入向量（和位置向量）中。这不是*严格*必要的，因为[SEP]标记明确分隔序列，但它使模型更容易学习序列的概念。    #    # 对于分类任务，第一个向量（对应于[CLS]）被用作“句子向量”。请注意，这只有在整个模型进行微调时才有意义。    tokens = []    segment_ids = []    tokens.append("[CLS]")    segment_ids.append(0)    for token in tokens_a:        tokens.append(token)        segment_ids.append(0)    tokens.append("[SEP]")    segment_ids.append(0)    if tokens_b:        for token in tokens_b:            tokens.append(token)            segment_ids.append(1)        tokens.append("[SEP]")        segment_ids.append(1)    input_ids = tokenizer.convert_tokens_to_ids(tokens)    # 掩码对于真实标记为1，对于填充标记为0。只有真实标记会被关注。    input_mask = [1] * len(input_ids)    # 零填充至序列长度。    while len(input_ids) < max_seq_length:        input_ids.append(0)        input_mask.append(0)        segment_ids.append(0)    assert len(input_ids) == max_seq_length    assert len(input_mask) == max_seq_length    assert len(segment_ids) == max_seq_length    label_id = label_map[example.label]    if ex_index < 5:        tf.logging.info("*** Example ***")        tf.logging.info("guid: %s" % (example.guid))        tf.logging.info("tokens: %s" % " ".join(            [tokenization.printable_text(x) for x in tokens]))        tf.logging.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))        tf.logging.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))        tf.logging.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids]))        tf.logging.info("label: %s (id = %d)" % (example.label, label_id))    feature = InputFeatures(        input_ids=input_ids,        input_mask=input_mask,        segment_ids=segment_ids,        label_id=label_id)    return feature

然后使用以下代码扩展BERT模型

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,                 labels, num_labels, use_one_hot_embeddings):    """创建一个分类模型。"""    model = modeling.BertModel(        config=bert_config,        is_training=is_training,        input_ids=input_ids,        input_mask=input_mask,        token_type_ids=segment_ids,        use_one_hot_embeddings=use_one_hot_embeddings)    # 在演示中，我们在整个段落上进行简单的分类任务。    #    # 如果你想使用标记级别的输出，请使用model.get_sequence_output()    # 代替。    output_layer = model.get_pooled_output()    hidden_size = output_layer.shape[-1].value    with tf.variable_scope("cls/seq_relationship"):        output_weights = tf.get_variable(            "output_weights", [num_labels, hidden_size])        output_bias = tf.get_variable(            "output_bias", [num_labels])    with tf.variable_scope("loss"):        if is_training:            # 即，0.1的dropout            output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)        logits = tf.matmul(output_layer, output_weights, transpose_b=True)        logits = tf.nn.bias_add(logits, output_bias)        probabilities = tf.nn.softmax(logits, axis=-1)        log_probs = tf.nn.log_softmax(logits, axis=-1)        one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)        per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)        loss = tf.reduce_mean(per_example_loss)        return (loss, per_example_loss, logits, probabilities)

probabilities – 这就是你需要的，它是下一句的预测结果

学技术

如何使用预训练的BERT模型进行下一句标注？

发表回复取消回复

相关文章：

从HTML页面提取纯净内容/文本，排除导航和框架内容

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复