Three Headed Mastery: minBERT as a Jack of All Trades in Multi-Task NLP

Fanelle, Valerie; Martinez, Rafael Perez; Orney, Ifdita Hasan; Liu, Nelson

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Fanelle, V., Martinez, R. P., Orney, I. H., & Liu, N. Three Headed Mastery: minBERT as a Jack of All Trades in Multi-Task NLP.

Resource type: Journal Article
BibTeX citation key: anon.61
View all bibliographic details

Categories: General
Creators: Fanelle, Liu, Martinez, Orney

Attachments

URLs https://www.semant ... tm_medium=33014503

Abstract

This work delves into minBERT, a “smaller” variant of the original BERT model, and combines various approaches that include parallel training, more sophisticated optimizers, and advanced architectures to enhance accuracy and efficiency in sentence-level tasks. The Bidirectional Encoder Representations from Transformers (BERT) model, introduced by Devlin et al. (2019), is a transformer-based model renowned for its remarkable performance across a variety of natural language processing (NLP) tasks, including sentiment analysis, paraphrase detection, and Semantic Textual Similarity. This groundbreaking model set new benchmarks in NLP upon its release in 2018. Building on this foundational work, we delve into minBERT, a “smaller” variant of the original BERT model. Our objective is to experiment with various approaches to enhance the performance of the minBERT model without being hindered by expensive computational demands. We combine various approaches that include parallel training, more sophisticated optimizers (e.g., PyTorch’s AdamW and Google Brain’s Lion), and advanced architectures to enhance accuracy and efficiency in sentence-level tasks. This strategy also involves fine-tuning critical hyperparameters and implementing methods like dropout and early stopping to prevent overfitting and ensure robust generalization. Lastly, we evaluate our minBERT model using three datasets: Stanford Sentiment Treebank, Quora Question Pairs, and Semantic Text Similarity. Our best-performing model obtains an overall score of 0.731 and 0.734 in the dev and test leaderboards, respectively.

Notes

[Online; accessed 25. May 2024]

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)