Wang, Z., Wu, Y., & Nguyen, A. H. BERT++: Trustworthy MultiTask Learning with BERT.
|
 |
|
Abstract
|
This paper aims to establish robust and generalizable sentence embeddings to improve performance in the three downstream tasks through experiments include additional pretraining on domain-specific corpora, cycled multitask finetuning with task-specific loss functions, gradient surgery to mitigate the vanishing gradient problem and extensive explorations on the prediction layers. Bidirectional encoder representation transformers have demonstrated state-of-the-art performance in a variety of NLP tasks. However, there is still room for improvement in finetuning pre-trained LLMs for downstream tasks such as sentiment analysis, paraphrase detection and semantic textual similarity analysis. In this paper, we aim to establish robust and generalizable sentence embeddings to improve performance in the three downstream tasks through experiments include additional pretraining on domain-specific corpora, cycled multitask finetuning with task-specific loss functions, gradient surgery to mitigate the vanishing gradient problem and extensive explorations on the prediction layers. We conduct extensive experiments on several benchmark datasets and show that our improved BERT models outperform the original BERT and other baselines by significant margins. We also provide ablation studies and error analysis to understand the effects of each improvement and iterate our model accordingly.
|
| Notes |
[Online; accessed 1. Jun. 2024]
|