Fanelle, V., Martinez, R. P., Orney, I. H., & Liu, N. Three Headed Mastery: minBERT as a Jack of All Trades in Multi-Task NLP.
|
 |
|
Abstract
|
This work delves into minBERT, a “smaller” variant of the original BERT model, and combines various approaches that include parallel training, more sophisticated optimizers, and advanced architectures to enhance accuracy and efficiency in sentence-level tasks. The Bidirectional Encoder Representations from Transformers (BERT) model, introduced by Devlin et al. (2019), is a transformer-based model renowned for its remarkable performance across a variety of natural language processing (NLP) tasks, including sentiment analysis, paraphrase detection, and Semantic Textual Similarity. This groundbreaking model set new benchmarks in NLP upon its release in 2018. Building on this foundational work, we delve into minBERT, a “smaller” variant of the original BERT model. Our objective is to experiment with various approaches to enhance the performance of the minBERT model without being hindered by expensive computational demands. We combine various approaches that include parallel training, more sophisticated optimizers (e.g., PyTorch’s AdamW and Google Brain’s Lion), and advanced architectures to enhance accuracy and efficiency in sentence-level tasks. This strategy also involves fine-tuning critical hyperparameters and implementing methods like dropout and early stopping to prevent overfitting and ensure robust generalization. Lastly, we evaluate our minBERT model using three datasets: Stanford Sentiment Treebank, Quora Question Pairs, and Semantic Text Similarity. Our best-performing model obtains an overall score of 0.731 and 0.734 in the dev and test leaderboards, respectively.
|
| Notes |
[Online; accessed 25. May 2024]
|