Cs224n, S., Project, D., & Pabari, R. Sentence-BERT-inspired Improvements to minBERT.
|
 |
|
Abstract
|
It is concluded that different architecture choices are better suited for different tasks, and Sentence-BERT may achieve decent performance as a baseline, but feature engineering and hyperparameter tuning can supercharge this performance in practice. Introduced in Devlin et al. (2018), Bidirectional Encoding Representations from Transformers (BERT) is a popular transformer-based model that generates contextual word embeddings that have empirircally performed well on many downstream tasks. These downstream tasks include sentiment analysis, paraphrase detection, and semantic textual similarity. However, the vanilla BERT model has some short-comings which have been explored and improved in the literature. This project seeks to learn if specifically the Sentence-BERT extensions of the vanilla min-BERT model yield the improvements that were demonstrated in the follow-on paper of Reimers and Gurevych (2019). Specifically, we implemented alternative pooling strategies and cosine similarity finetuning to see if we observe an improvement on the aforementioned downstream tasks. We conclude that different architecture choices are better suited for different tasks, and surprisingly achieved better performance on some of the tasks by using approaches different than those proposed by Sentence-BERT. These results suggest that, while one-size-fits-all architectures such as Sentence-BERT may achieve decent performance as a baseline, feature engineering and hyperparameter tuning can supercharge this performance in practice.
|
| Notes |
[Online; accessed 25. May 2024]
|