WIKINDX

WIKINDX Resources

Cs224n, S., Project, D., Gu, C., Mentor, •., & Khosla, S. Finetuning a multitask BERT for downstream tasks. 
Resource type: Journal Article
BibTeX citation key: anon.40
View all bibliographic details
Categories: General
Creators: Cs224n, Gu, Khosla, Mentor, Project
Attachments   URLs   https://www.semant ... 7eebe80e087b6f01a3
Abstract
It is found that cosine similarity works best for semantic textual similarity, while sentence concatenation works best for paraphrase detection, suggesting that training a multitask BERT may be a “natural” multitask learning problem, with few conflicting gradients between tasks. Multitask models are of interest and importance in NLP because of their ability to handle a variety of different tasks. In particular, Transformer-based models such as BERT are able to generate useful sentence-level representations, leading to strong performance on many downstream tasks. There are many proposed methods for using BERT and its embeddings for downstream tasks. However, there has been little direct comparison between these methods, holding constant BERT’s model architecture. To address this gap, we perform experiments to find effective methods for building and finetuning a multitask BERT that simultaneously performs well on multiple tasks. We find that one-size-fits-all approach is not optimal—different tasks have different methods which work best. Among the methods we test, we find that cosine similarity works best for semantic textual similarity, while sentence concatenation works best for paraphrase detection. Also, we find that gradient surgery and model ensembling do not deliver significant performance gains, suggesting that training a multitask BERT may be a “natural” multitask learning problem, with few conflicting gradients between tasks.
  
Notes
[Online; accessed 1. Jun. 2024]
  
WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)