Sentence Part-Enhanced minBERT: Incorporating Sentence Parts to Improve BERT Performance on Downstream Tasks

Wan, Aaron; Mentor, •.; Limonchik, Shai

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Wan, A., Mentor, •., & Limonchik, S. Sentence part-enhanced minBERT: Incorporating sentence parts to improve BERT performance on downstream tasks.

Resource type: Journal Article
BibTeX citation key: anon.178
View all bibliographic details

Categories: General
Creators: Limonchik, Mentor, Wan

Attachments

URLs https://www.semant ... e7c68a0ccbf24899a1

Abstract

A key finding is that Spe-based models performed better in absence of multitask learning, and it may be beneficial to avoid multitask learning if downstream tasks require different sentence parts since this allows the model to focus on information from inherently different embeddings. BERT-based models typically consider the embedding of the classification token when representing sentence meaning, meaning they do not take into account the different levels of importance of different sentence parts when applying BERT embeddings to downstream tasks. The goal of this project was to implement the default project’s multitask minBERT model and improve its performance on sentiment analysis, paraphrase detection, and semantic textual similarity by incorporating sentence parts into the minBERT model for the respective downstream tasks. Sentence parts were incorporated in the same manner as the sentence part-enhanced (SpeBERT) model. We also experimented with gradient surgery, cosine similarity fine-tuning, and SMART optimization to improve model results. Our findings support the findings of the SpeBERT authors, as we find that incorporating main sentence parts substantially improves performance on semantic textual similarity, and other sentence parts improve performance on sentiment classification. Like the SpeBERT authors, we also find selecting the correct aggregation strategy is important for optimizing Spe-model performance. However, we find that for paraphrase detection, incorporating main or other sentence parts does not appear to improve performance. A key finding is that Spe-based models performed better in absence of multitask learning. It may be beneficial to avoid multitask learning if downstream tasks require different sentence parts since this allows the model to focus on information from inherently different embeddings. Our best model was an ensemble model of three Spe-based models trained for different tasks. By incorporating sentence parts for sentiment classification and semantic textual similarity, our ensemble model achieved a dev set score of 0.715 and a test set score of 0.705.

Notes

[Online; accessed 1. Jun. 2024]

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)