WIKINDX

WIKINDX Resources

Xiao, S., Liu, Z., Zhang, P., & Muennighoff, N. C-Pack: Packaged Resources To Advance General Chinese Embedding. 
Resource type: Journal Article
BibTeX citation key: anon.182
View all bibliographic details
Categories: General
Creators: Liu, Muennighoff, Xiao, Zhang
Attachments   URLs   https://www.semant ... 646c5e1576f0b3c901
Abstract
The English models achieve state-of-the-art performance on the MTEB benchmark; meanwhile, the released English data and models for English text embeddings is 2 times larger than the Chinese data. We introduce C-Pack , a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10\% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM . Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on the MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding .
  
WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)