WIKINDX

WIKINDX Resources  

Wan, A., Mentor, •., & Limonchik, S. Sentence part-enhanced minBERT: Incorporating sentence parts to improve BERT performance on downstream tasks. 
Resource type: Journal Article
BibTeX citation key: anon.178
View all bibliographic details
Categories: General
Creators: Limonchik, Mentor, Wan
Attachments   URLs   https://www.semant ... e7c68a0ccbf24899a1
Abstract
A key finding is that Spe-based models performed better in absence of multitask learning, and it may be beneficial to avoid multitask learning if downstream tasks require different sentence parts since this allows the model to focus on information from inherently different embeddings. BERT-based models typically consider the embedding of the classification token when representing sentence meaning, meaning they do not take into account the different levels of importance of different sentence parts when applying BERT embeddings to downstream tasks. The goal of this project was to implement the default project’s multitask minBERT model and improve its performance on sentiment analysis, paraphrase detection, and semantic textual similarity by incorporating sentence parts into the minBERT model for the respective downstream tasks. Sentence parts were incorporated in the same manner as the sentence part-enhanced (SpeBERT) model. We also experimented with gradient surgery, cosine similarity fine-tuning, and SMART optimization to improve model results. Our findings support the findings of the SpeBERT authors, as we find that incorporating main sentence parts substantially improves performance on semantic textual similarity, and other sentence parts improve performance on sentiment classification. Like the SpeBERT authors, we also find selecting the correct aggregation strategy is important for optimizing Spe-model performance. However, we find that for paraphrase detection, incorporating main or other sentence parts does not appear to improve performance. A key finding is that Spe-based models performed better in absence of multitask learning. It may be beneficial to avoid multitask learning if downstream tasks require different sentence parts since this allows the model to focus on information from inherently different embeddings. Our best model was an ensemble model of three Spe-based models trained for different tasks. By incorporating sentence parts for sentiment classification and semantic textual similarity, our ensemble model achieved a dev set score of 0.715 and a test set score of 0.705.
  
Notes
[Online; accessed 1. Jun. 2024]
  
WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)