Kehoe, N., & Ravella, P. Finetuning minBERT for Downstream Tasks with Multitasking.
|
 |
|
Abstract
|
This work wants to extend the Bidi-rectional Encoder Representations Transformers (minBERT) model to make high quality predictions on multiple sentence-level tasks, namely sentiment analysis, paraphrase detection, and semantic textual similarity. Creating fine-tuned language models of pre-trained transformer language models are extremely powerful tools that have made an immense revolution in the field of machine learning. However, fine-tuning these models on single down stream tasks do not generalize across various other tasks. We want to extend the Bidi-rectional Encoder Representations Transformers (minBERT) model to make high quality predictions on multiple sentence-level tasks, namely sentiment analysis, paraphrase detection, and semantic textual similarity. The advantage of using one model with a multitasking capability is it creates more robust and generalized sentence embeddings which perform well on a variety of tasks. We experiment with various method to create generalized embeddings extending off the minBert model with techniques and avoiding task interference as much as possible, such as Gradient Surgery (Yu et al., 2020), Gradient Vaccine (Wang et al., 2020), SMART regularization techniques for fine-tuning (Jiang et al., 2019), model embedding optimizations, and stronger multi-head networks
|
| Notes |
[Online; accessed 14. May 2024]
|