Cs224n, S., Project, D., & Kumbong, H. MT-BERT: Fine-tuning BERT for Downstream Tasks Using Multi-Task Learning.
|
 |
|
Abstract
|
This work uses four main techniques to improve the baseline BERT implementation: additional pretraining on the target domain data using Masked Language Modelling, multi-task fine-tuning using gradient surgery, single-task fine-tuning, and feature augmentation. The goal of this project is to fine-tune the contextualized embeddings from BERT to perform well simultaneously on three different downstream tasks: Sentiment Analysis (SST), Paraphrase Detection (PD), and Semantic Textual Similarity (STS). Our work uses four main techniques to improve the baseline BERT implementation: additional pretraining on the target domain data using Masked Language Modelling, multi-task fine-tuning using gradient surgery, single-task fine-tuning, and feature augmentation. We are able to achieve accuracies of 0.539 and 0.877 for SST and PD respectively, and a Pearson correlation score of 0.863 on the STS test set using a single model without ensembling.
|
| Notes |
[Online; accessed 1. Jun. 2024]
|