Cs224n, S., Dai, T., Jiang†, Y., Calcagno‡, R., & Jiang‡, S. Improving minBERT on Downstream Tasks Through Combinatorial Extensions.
|
 |
|
Abstract
|
We identify a combination of modifications to BERT that performs best across a range of NLP tasks: sentiment analysis (SA), paraphrase detection (PD), and semantic textual similarity (STS). As a baseline, we build minBERT, a simplified BERT model. We improve this model by implementing combinations of extensions and adjustments: 1) SBERT[1] to improve sentence embeddings and efficiency; 2) LoRA[2] to improve efficiency; 3) proportional batch sampling from BERT and PALs Section 4.1[3] to reduce overfitting, and 4) additional fine-tuning data. We make hyperparameter adjustments within these extensions where applicable to maximize the performance of each individual extension. We find that performance is maximized by majority vote ensembling the predictions of multiple model variations where SBERT is combined with proportional batch sampling. LoRA increases efficiency via reducing trainable parameters, but decreases performance. Additional finetuning data has mixed effects on performance.
|
| Notes |
[Online; accessed 29. Aug. 2024]
|