Sentence-BERT-inspired Improvements to minBERT

Cs224n, Stanford; Project, Default; Pabari, Raj

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Cs224n, S., Project, D., & Pabari, R. Sentence-BERT-inspired Improvements to minBERT.

Resource type: Journal Article
BibTeX citation key: anon.48
View all bibliographic details

Categories: General
Creators: Cs224n, Pabari, Project

Attachments

URLs https://www.semant ... tm_medium=33014503

Abstract

It is concluded that different architecture choices are better suited for different tasks, and Sentence-BERT may achieve decent performance as a baseline, but feature engineering and hyperparameter tuning can supercharge this performance in practice. Introduced in Devlin et al. (2018), Bidirectional Encoding Representations from Transformers (BERT) is a popular transformer-based model that generates contextual word embeddings that have empirircally performed well on many downstream tasks. These downstream tasks include sentiment analysis, paraphrase detection, and semantic textual similarity. However, the vanilla BERT model has some short-comings which have been explored and improved in the literature. This project seeks to learn if specifically the Sentence-BERT extensions of the vanilla min-BERT model yield the improvements that were demonstrated in the follow-on paper of Reimers and Gurevych (2019). Specifically, we implemented alternative pooling strategies and cosine similarity finetuning to see if we observe an improvement on the aforementioned downstream tasks. We conclude that different architecture choices are better suited for different tasks, and surprisingly achieved better performance on some of the tasks by using approaches different than those proposed by Sentence-BERT. These results suggest that, while one-size-fits-all architectures such as Sentence-BERT may achieve decent performance as a baseline, feature engineering and hyperparameter tuning can supercharge this performance in practice.

Notes

[Online; accessed 25. May 2024]

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)