MSMARCO eval split in MTEB English (classic) benchmark

Using the benchmarks object to evaluate the classic MTEB benchmark (`MTEB(eng, classic)`) uses the `test` split for [all](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/benchmarks/benchmarks.py#L197) datasets. For MS-MARCO, the paper states that the `dev` set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the [docs](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md):
```
tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark

evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MSMARCO eval split in MTEB English (classic) benchmark #1608

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MSMARCO eval split in MTEB English (classic) benchmark #1608

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions