You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the benchmarks object to evaluate the classic MTEB benchmark (MTEB(eng, classic)) uses the test split for all datasets. For MS-MARCO, the paper states that the dev set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the docs:
tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark
evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")