WIKINDX

WIKINDX Resources

Huang, X., Peng, H., Zou, D., Liu, Z., Li, J., & Liu, K., et al. CoSENT: Consistent Sentence Embedding via Similarity Ranking. 
Resource type: Journal Article
BibTeX citation key: anon.74
View all bibliographic details
Categories: General
Keywords: Fontos!, RAG
Creators: Huang, Li, Liu, Liu, Peng, Su, Wu, Yu, Zou
Attachments   URLs   https://www.semant ... tm_medium=34898166
Abstract
The failure of cosine similarity in semantic textual similarity measuring is explained, and CoSENT, a novel Consistent SENTence embedding framework is presented, designed to optimize the Siamese BERT network by exploiting ranked similarity labels of sample pairs. Learning the representation of sentences is fundamental work in the field of Natural Language Processing. Although BERT-like transformers have achieved new SOTAs for sentence embedding in many tasks, they have been proven difficult to capture semantic similarity without proper fine-tuning. A common idea to measure Semantic Textual Similarity (STS) is considering the distance between two text embeddings defined by the dot product or cosine function. However, the semantic embedding spaces induced by pretrained transformers are generally non-smooth and tend to deviate from a normal distribution, which makes traditional distance metrics imprecise. In this paper, we first empirically explain the failure of cosine similarity in semantic textual similarity measuring, and present CoSENT, a novel Consistent SENTence embedding framework. Concretely, a supervised objective function is designed to optimize the Siamese BERT network by exploiting ranked similarity labels of sample pairs. The loss function utilizes uniform cosine similarity-based optimization for both the training and prediction phases, improving the consistency of the learned semantic space. Additionally, the unified objective function can be adaptively applied to different datasets with various types of annotations and different comparison schemes of the STS tasks only by using sortable labels. Empirical evaluations on 14 common textual similarity benchmarks demonstrate that the proposed CoSENT excels in performance and reduces training time cost.
  
WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)