C-Pack: Packaged Resources To Advance General Chinese Embedding

Xiao, Shitao; Liu, Zheng; Zhang, Peitian; Muennighoff, Niklas

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Xiao, S., Liu, Z., Zhang, P., & Muennighoff, N. C-Pack: Packaged Resources To Advance General Chinese Embedding.

Resource type: Journal Article
BibTeX citation key: anon.182
View all bibliographic details

Categories: General
Creators: Liu, Muennighoff, Xiao, Zhang

Attachments

URLs https://www.semant ... 646c5e1576f0b3c901

Abstract

The English models achieve state-of-the-art performance on the MTEB benchmark; meanwhile, the released English data and models for English text embeddings is 2 times larger than the Chinese data. We introduce C-Pack , a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10\% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM . Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on the MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding .

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)