-
Notifications
You must be signed in to change notification settings - Fork 462
Closed
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededleaderboardissues related to the leaderboardissues related to the leaderboard
Description
Many of the models that have been run on the original C-MTEB and we have results on are currently missing ModelMeta
objects in the library.
Here's a list of Chinese-specific models that have yet to be added to MTEB:
missing_meta_chinese = [
"BAAI/bge-base-zh", # not planned, outdated
"BAAI/bge-base-zh-v1.5",
"BAAI/bge-large-zh", # not planned, outdated
"BAAI/bge-large-zh-noinstruct", # not planned, outdated
"BAAI/bge-large-zh-v1.5",
"BAAI/bge-small-zh", # not planned, outdated
"BAAI/bge-small-zh-v1.5",
"DMetaSoul/Dmeta-embedding-zh-small",
"DMetaSoul/sbert-chinese-general-v1",
"Erin/IYun-large-zh",
"Erin/mist-zh",
"Pristinenlp/alime-embedding-large-zh",
"Pristinenlp/alime-reranker-large-zh",
"RookieHX/bge_m3e_stella",
"akarum/cloudy-large-zh",
"arkohut/jina-embeddings-v2-base-zh",
"dunzhang/stella-large-zh-v3-1792d",
"dunzhang/stella-mrl-large-zh-v3.5-1792d",
"fangxq/XYZ-embedding-zh",
"fangxq/XYZ-embedding-zh-v2",
"iampanda/zpoint_large_embedding_zh",
"infgrad/stella-base-zh",
"infgrad/stella-base-zh-v2",
"infgrad/stella-base-zh-v3-1792d",
"infgrad/stella-large-zh",
"infgrad/stella-large-zh-v2",
"jinaai/jina-embeddings-v2-base-zh",
"moka-ai/m3e-base",
"moka-ai/m3e-large",
"neofung/m3e-ernie-xbase-zh",
"sensenova/piccolo-base-zh",
"sensenova/piccolo-large-zh",
"sensenova/piccolo-large-zh-v2",
"shanghung/stella-base-zh-v3-1792d",
"shibing624/text2vec-base-chinese", # Needs custom implementation
"shibing624/text2vec-large-chinese", # Needs custom implementation
"silverjam/jina-embeddings-v2-base-zh",
"thenlper/gte-base-zh",
"thenlper/gte-large-zh",
"thenlper/gte-small-zh",
"towing/gte-small-zh", # not planned
"lier007/xiaobu-embedding",
"Classical/Yinka",
"TencentBAC/Conan-embedding-v1",
"lier007/xiaobu-embedding-v2",
]
As well as a list of multilingual models that are currently missing metadata:
missing_meta_multilingual = [
"Alibaba-NLP/gte-multilingual-base",
"BAAI/bge-multilingual-gemma2",
"EdwardBurgin/paraphrase-multilingual-mpnet-base-v2",
"HIT-TMG/KaLM-embedding-multilingual-max-instruct-v1",
"barisaydin/text2vec-base-multilingual", # Needs custom implementation
"beademiguelperez/sentence-transformers-multilingual-e5-small",
"bedrock/cohere-embed-multilingual-v3",
"gizmo-ai/Cohere-embed-multilingual-v3.0",
"sentence-transformers/distiluse-base-multilingual-cased-v2",
"sentence-transformers/use-cmlm-multilingual",
"vprelovac/universal-sentence-encoder-multilingual-3",
"vprelovac/universal-sentence-encoder-multilingual-large-3",
]
Most of these should be pretty trivial to add.
isaac-chung
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededleaderboardissues related to the leaderboardissues related to the leaderboard