Skip to content

Missing Metadata for tasks in MTEB(eng, classic) #1886

@x-tabdeveloping

Description

@x-tabdeveloping

Many tasks in MTEB(eng, classic) are missing metadata, which is messing with the filtering on the leaderboard.
Here's a list:

import mteb

for task in mteb.get_benchmark("MTEB(eng, classic)").tasks:
    if not task.metadata.domains:
        print(f"{task.metadata.name}.domains = {task.metadata.domains}")
    if not task.metadata.languages:
        print(f"{task.metadata.name}.languages = {task.metadata.languages}")
    if not task.metadata.type:
        print(f"{task.metadata.name}.type = {task.metadata.type}")
ArxivClusteringS2S.domains = None
AskUbuntuDupQuestions.domains = None
BIOSSES.domains = None
CQADupstackAndroidRetrieval.domains = None
CQADupstackEnglishRetrieval.domains = None
CQADupstackGamingRetrieval.domains = None
CQADupstackGisRetrieval.domains = None
CQADupstackMathematicaRetrieval.domains = None
CQADupstackPhysicsRetrieval.domains = None
CQADupstackStatsRetrieval.domains = None
CQADupstackTexRetrieval.domains = None
CQADupstackUnixRetrieval.domains = None
CQADupstackWebmastersRetrieval.domains = None
CQADupstackWordpressRetrieval.domains = None
ClimateFEVER.domains = None
FEVER.domains = None
FiQA2018.domains = None
NQ.domains = None
QuoraRetrieval.domains = None
RedditClustering.domains = None
RedditClusteringP2P.domains = None
STSBenchmark.domains = None
StackExchangeClustering.domains = None
StackExchangeClusteringP2P.domains = None
StackOverflowDupQuestions.domains = None
TwitterSemEval2015.domains = None
TwitterURLCorpus.domains = None
MSMARCO.domains = None

Metadata

Metadata

Labels

leaderboardissues related to the leaderboard

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions