Krishna, V., & Bansal, R. Contrastive Learning for Sentence Embeddings in BERT and its Smaller Variants.
|
 |
|
Abstract
|
This project demonstrates that both supervised and unsupervised contrastive learning approaches provide improved semantic performance for smaller BERT architectures both in pre-training and downstreaming objectives, while improving the representational uniformity of the word embeddings and retaining widespread downstream flexibility. Contrastive learning is a method of learning representations using invariances in the data under augmentations and encouraging the resultant embeddings of augmented samples to remain close together. An interesting property of such approaches is that they enable models to perform better on different tasks even when trained on smaller amounts of data and also enables smaller models to perform as well as their larger counterparts. In this project, we demonstrate that both supervised and unsupervised contrastive learning approaches provide improved semantic performance for smaller BERT architectures (including BERT small , and BERT mini ) both in pre-training and downstreaming objectives, while improving the representational uniformity of the word embeddings and retaining widespread downstream flexibility. Our results indicate that we can continue to maximize performance in smaller transformer architectures and produce comparable results to larger state-of-the-art architectures at a fraction of the computing cost and training time. We conclude by offering new areas of research that may provide even larger boosts to semantic performance, including supervised applications in computer vision that have shown to perform well for comparable objectives.
|
| Notes |
[Online; accessed 1. Jun. 2024]
|