Default, S. C., Shreya, P., D’Souza, Dulepet, R., Dass, M., & Tingke, A. Beyond Fine-tuning: Iterative Ensemble Strategies for Enhanced BERT Generalizability.
|
 |
|
Abstract
|
This work pre-train and fine-tune the BERT model to specialize in generating sentence embeddings, enabling efficient application in various downstream tasks such as sentiment analysis, para-phrase detection, and semantic textual similarity. The Bidirectional Encoder Representations from Transformers (BERT) model has revolutionized natural language processing (NLP) by providing a contextual understanding of human language. Leveraging BERT’s capabilities, we pre-train and fine-tune the model to specialize in generating sentence embeddings, enabling efficient application in various downstream tasks such as sentiment analysis, para-phrase detection, and semantic textual similarity. We experiment with different fine-tuning strategies and advanced techniques, including multi-task classification, gradient surgery, cosine similarity, and ensemble modeling. Our results demonstrate significant improvements in performance across all tasks, with ensembling emerging as a particularly effective technique. Notably, we introduce an iterative ensembling approach, stacking layers of ensemble models to achieve an overall test set performance of 71.6\%.
|
| Notes |
[Online; accessed 25. May 2024]
|