David, G., Mon, K., & Zhu, A. Simple Contrastive Learning for Multitask Finetuning.
|
 |
|
Abstract
|
This project employs the following extensions for the Vanilla BERT model: Additional pretraining using Simple Contrastive Learning, Gradient Surgery, Multitask Fine-Tuning, Hyperparameter Finetuning, and Model Ensembling to improve performance on three downstream tasks. We explore extensions and methods to improve and fine-tune BERT, a model that uses Bidirectional Encoder Representations from Transformers to develop deep contextual word representations. Since its release, BERT has shown to be the base for state-of-the-art models for a wide range of tasks. In this project, we employ the following extensions for the Vanilla BERT model: Additional pretraining using Simple Contrastive Learning, Gradient Surgery, Multitask Fine-Tuning, Hyperparameter Finetuning, and Model Ensembling to improve performance on three downstream tasks: 1) Sentiment Analysis, 2) Paraphrase Detection, and 3) Semantic Textual Similarity. As a baseline, our model performance using vanilla fine-tuning was 0.526, 0.0442, and -0.041, respectively. After implementing various extensions, our ensembled model yielded the respective accuracies of 0.528, 0.625, and 0.446.
|
| Notes |
[Online; accessed 25. May 2024]
|