Skip to content

Missing ModelMeta for Chinese models #1803

@x-tabdeveloping

Description

@x-tabdeveloping

Many of the models that have been run on the original C-MTEB and we have results on are currently missing ModelMeta objects in the library.

Here's a list of Chinese-specific models that have yet to be added to MTEB:

missing_meta_chinese = [
    "BAAI/bge-base-zh", # not planned, outdated
    "BAAI/bge-base-zh-v1.5",
    "BAAI/bge-large-zh", # not planned, outdated
    "BAAI/bge-large-zh-noinstruct",  # not planned, outdated
    "BAAI/bge-large-zh-v1.5",
    "BAAI/bge-small-zh",  # not planned, outdated
    "BAAI/bge-small-zh-v1.5",
    "DMetaSoul/Dmeta-embedding-zh-small",
    "DMetaSoul/sbert-chinese-general-v1",
    "Erin/IYun-large-zh",
    "Erin/mist-zh",
    "Pristinenlp/alime-embedding-large-zh",
    "Pristinenlp/alime-reranker-large-zh",
    "RookieHX/bge_m3e_stella",
    "akarum/cloudy-large-zh",
    "arkohut/jina-embeddings-v2-base-zh",
    "dunzhang/stella-large-zh-v3-1792d",
    "dunzhang/stella-mrl-large-zh-v3.5-1792d",
    "fangxq/XYZ-embedding-zh",
    "fangxq/XYZ-embedding-zh-v2",
    "iampanda/zpoint_large_embedding_zh",
    "infgrad/stella-base-zh",
    "infgrad/stella-base-zh-v2",
    "infgrad/stella-base-zh-v3-1792d",
    "infgrad/stella-large-zh",
    "infgrad/stella-large-zh-v2",
    "jinaai/jina-embeddings-v2-base-zh",
    "moka-ai/m3e-base",
    "moka-ai/m3e-large",
    "neofung/m3e-ernie-xbase-zh",
    "sensenova/piccolo-base-zh",
    "sensenova/piccolo-large-zh",
    "sensenova/piccolo-large-zh-v2",
    "shanghung/stella-base-zh-v3-1792d",
    "shibing624/text2vec-base-chinese",  # Needs custom implementation
    "shibing624/text2vec-large-chinese",  # Needs custom implementation
    "silverjam/jina-embeddings-v2-base-zh",
    "thenlper/gte-base-zh",
    "thenlper/gte-large-zh",
    "thenlper/gte-small-zh",
    "towing/gte-small-zh", # not planned
    "lier007/xiaobu-embedding",
    "Classical/Yinka",
    "TencentBAC/Conan-embedding-v1",
    "lier007/xiaobu-embedding-v2",
]

As well as a list of multilingual models that are currently missing metadata:

missing_meta_multilingual = [
    "Alibaba-NLP/gte-multilingual-base",
    "BAAI/bge-multilingual-gemma2",
    "EdwardBurgin/paraphrase-multilingual-mpnet-base-v2",
    "HIT-TMG/KaLM-embedding-multilingual-max-instruct-v1",
    "barisaydin/text2vec-base-multilingual", # Needs custom implementation
    "beademiguelperez/sentence-transformers-multilingual-e5-small",
    "bedrock/cohere-embed-multilingual-v3",
    "gizmo-ai/Cohere-embed-multilingual-v3.0",
    "sentence-transformers/distiluse-base-multilingual-cased-v2",
    "sentence-transformers/use-cmlm-multilingual",
    "vprelovac/universal-sentence-encoder-multilingual-3",
    "vprelovac/universal-sentence-encoder-multilingual-large-3",
]

Most of these should be pretty trivial to add.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomershelp wantedExtra attention is neededleaderboardissues related to the leaderboard

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions