Cs224n, S., Project, D., Chen, J., & Hanley, H. Fine Tuning Multi Downstream Tasks based on BERT with Gradient Surgery.
|
 |
|
Abstract
|
This project has tested different fine-tuning techniques for improving the model performance on the aforementioned three downstream tasks, including sentiment analysis, paraphrase detection and semantic textual similarity analysis. Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based model that generates contextual word representations. Such representation could be utilized to multiple downstream tasks, including sentiment analysis, paraphrase detection and semantic textual similarity analysis. In order to create more robust and semantically-rich sentence embeddings, it is important to fine-tune the BERT weights with task-relative data efficiently. In the project, we have tested different fine-tuning techniques for improving the model performance on the aforementioned three downstream tasks. These tests include changes in the architecture (dense layer size, Siamese network), in the optimization step (train separately or together, whether to apply gradient surgery or not), and in the training setting (whether to reshuffle data after each epoch, batch size, regularization). After comparison between different tests, under the limit of AWS g5.2xlarge instance, the best performance is achieved with (1) Add Siamese network and cosine similarity for semantic textual similarity task (2) Pass embeddings for two sentences with their difference into the dense linear layer for the paraphrase task (3) Multi-task training with gradient surgery (4) Reshuffle dataloader after each training epoch (5) Apply l2 regularization (6) Train under batch size 2 for sentiment and similarity task, and batch size 32 for paraphrase task. The model achieves 52.1\% accuracy for sentiment task, 82.8\% accuracy for paraphrase task and 0.820 Pearson Correlation for semantic textual similarity task on the dev set, and 52.4\%, 82.7\% and 0.792 on the test set.
|
| Notes |
[Online; accessed 1. Jun. 2024]
|