Skip to content

MSMARCO eval split in MTEB English (classic) benchmark #1608

@aashka-trivedi

Description

@aashka-trivedi

Using the benchmarks object to evaluate the classic MTEB benchmark (MTEB(eng, classic)) uses the test split for all datasets. For MS-MARCO, the paper states that the dev set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the docs:

tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark

evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions