google开源的tensorflow版本的bert 源码见 https://github.com/google-research/bert。本文主要对该官方代码的一些关键部分进行解读。
在上一篇中我们对bert模型的网络结构代码进行了解读,本文我们将分析将bert模型运用到下游任务上,对模型进行finetune的过程。
下游任务
如图所示,我们可以在不同的下游任务上fine-tune BERT模型:
模型输出
我们可以通过如下函数得到bert网络的最后一层的输出:
# for classification task
# [batch_size, hidden_size]
output_layer = model.get_pooled_output()
# for seq2seq or ner task to get token-level output
# [batch_size, seq_length, hidden_size]
output_layer = model.get_sequence_output()
简单分类任务
针对简单的分类任务,在获取bert最后一层sentence-level级别的输出后,只需要在后面加一层全连接层即可。然后计算loss,具体可参考bert官方提供的run_classifier.py
:
# In the demo, we are doing a simple classification task on the entire segment.
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
序列标注任务
这里以NER任务为例,在获取bert网络最后一层token-level级别的输出后,也仅需要在后面加一层全连接层即可。然后计算loss,但是这里计算loss时需要过滤掉padding出来的tokens上的loss。
# If you want to use the token-level output, use model.get_sequence_output()
output_layer = model.get_sequence_output()
hidden_size = output_layer.shape[-1].value
output_weight = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02)
)
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer()
)
# loss 和 predict 需要自己定义
with tf.variable_scope("loss"):
if is_training:
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
output_layer = tf.reshape(output_layer, [-1, hidden_size])
logits = tf.matmul(output_layer, output_weight, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
#
logits = tf.reshape(logits, [-1, FLAGS.max_seq_length, num_labels])
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) # [bs, seq_length]
mask = tf.cast(input_mask, dtype=tf.float32)
per_example_loss = tf.reduce_sum(per_example_loss * mask, axis=-1) # [bs]
total_size = tf.reduce_sum(mask, axis=-1) + 1e-12
per_example_loss /= total_size # [bs]
loss = tf.reduce_mean(per_example_loss)
probabilities = tf.nn.softmax(logits, axis=-1)
predict = tf.argmax(probabilities, axis=-1)
同时,token-level的评估指标可设置如下:
# 比如这里的labels总共8类,分别为["PAD", "B", "I", "E", "S", "O", "[CLS]", "[SEP]"]
def metric_fn(per_example_loss, label_ids, logits):
# [bs, max_seq_length]
predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)
precision = tf_metrics.precision(label_ids, predictions, 8, [1, 2, 3, 4], average="macro")
recall = tf_metrics.recall(label_ids, predictions, 8, [1, 2, 3, 4], average="macro")
f = tf_metrics.f1(label_ids, predictions, 8, [1, 2, 3, 4], average="macro")
eval_loss = tf.metrics.mean(values=per_example_loss)
# eval_accuracy = tf.metrics.accuracy(labels=label_ids, predictions=predictions)
return {
"eval_precision": precision,
"eval_recall": recall,
"eval_f": f,
"eval_loss": eval_loss,
# "eval_acc": eval_accuracy
}
其实NER更应该用entity-level的评估指标,具体可参考 https://github.com/kyzhouhzau/BERT-NER
BiLSTM + CRF
对于序列标注任务,有时还想在后面加上bilstm以及crf层,请参考:
https://github.com/xuanzebi/BERT-CH-NER
如果想要了解crf的原理,强烈推荐博客CRF Layer on the Top of BiLSTM
https://github.com/kyzhouhzau/BERT-NER
Named Entity Recongnition Taggin
多分类的精确度P、召回率R、F1值计算
https://github.com/xuanzebi/BERT-CH-NER
CRF Layer on the Top of BiLSTM