dataset: Add R2MED Benchmark #2740

ll0ruc · 2025-05-30T06:53:51Z

Checklist

Reason for dataset addition: (R2MED dataset, https://huggingface.co/R2MED) R2MED, the first benchmark explicitly designed for reasoning-driven medical retrieval. More details in Paper, Homepage

I did not add a dataset, or if I did, I added the dataset checklist to the PR and completed it.
I have tested that the dataset runs with the mteb package.
I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)
I did not add a model, or if I did, I added the model checklist to the PR and completed it.

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

KennethEnevoldsen · 2025-05-31T11:47:18Z

Related to: embeddings-benchmark/results#209

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

scripts/run_mteb_r2med.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

mteb/benchmarks/benchmarks.py

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

mteb/benchmarks/benchmarks.py

KennethEnevoldsen · 2025-06-05T11:17:48Z

@ll0ruc seems like linting fails. You can fix this by running make lint

ll0ruc · 2025-06-05T13:10:12Z

@ll0ruc seems like linting fails. You can fix this by running make lint

While I have ruff format ./mteb/benchmarks/benchmarks.py,
ruff check ./mteb/benchmarks/benchmarks.py --fix
for three new file ./mteb/benchmarks/benchmarks.py, ./mteb/tasks/Retrieval/__init__.py, ./mteb/tasks/Retrieval/eng/R2MEDRetrieval.py compared the mteb. Then update them, while it still seems like linting fails. I don't know how to solve it

KennethEnevoldsen · 2025-06-05T13:23:39Z

@ll0ruc seems like it passed - I have enabled auto-merge on this one. @ll0ruc, given that it is now multiple tasks, you will need to rerun the models to obtain the correct result format.

ll0ruc · 2025-06-05T13:33:44Z

@ll0ruc seems like it passed - I have enabled auto-merge on this one. @ll0ruc, given that it is now multiple tasks, you will need to rerun the models to obtain the correct result format.

OK, I will upload the new results in embeddings-benchmark/results#212 according to the current code version.

KennethEnevoldsen · 2025-06-09T06:23:45Z

The test still fail with the error:

The metadata of the following datasets is not filled: ['R2MEDBiologyRetrieval', 'R2MEDBioinformaticsRetrieval', 'R2MEDMedicalSciencesRetrieval', 'R2MEDMedXpertQAExamRetrieval', 'R2MEDMedQADiagRetrieval', 'R2MEDPMCTreatmentRetrieval', 'R2MEDPMCClinicalRetrieval', 'R2MEDIIYiClinicalRetrieval'].

KennethEnevoldsen · 2025-06-09T06:41:04Z

I attempted the merge, since I thought the error was on our side. That is not the case, we are still missing date. Opened a new PR at #2795 (can't re-open this one).

ll0ruc added 5 commits May 29, 2025 03:02

Add files via upload

2da6200

Add files via upload

4627a8a

Update benchmarks.py

1694bc3

Update __init__.py

015496b

Add files via upload

14a34d9

Samoed reviewed May 31, 2025

View reviewed changes

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Outdated Show resolved Hide resolved

Update R2MEDRetrieval.py

a8d9045

Samoed reviewed Jun 3, 2025

View reviewed changes

ll0ruc and others added 5 commits June 3, 2025 18:04

Merge branch 'embeddings-benchmark:main' into main

88af06f

Update run_mteb_r2med.py

559c682

Delete scripts/run_mteb_r2med.py

9c9aacb

Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

aaec0c4

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

ffd076e

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Samoed reviewed Jun 3, 2025

View reviewed changes

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Show resolved Hide resolved

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Outdated Show resolved Hide resolved

ll0ruc and others added 2 commits June 3, 2025 18:23

Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

d3c15bf

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

e0df082

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Samoed reviewed Jun 3, 2025

View reviewed changes

mteb/benchmarks/benchmarks.py Show resolved Hide resolved

Samoed requested a review from KennethEnevoldsen June 3, 2025 18:41

KennethEnevoldsen mentioned this pull request Jun 4, 2025

Add results for R2MED benchmark-V2 embeddings-benchmark/results#212

Merged

3 tasks

ll0ruc added 3 commits June 5, 2025 14:47

Add files via upload

6925e05

Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json

8b9e931

Add files via upload

161ab47

Samoed reviewed Jun 5, 2025

View reviewed changes

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Outdated Show resolved Hide resolved

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Outdated Show resolved Hide resolved

ll0ruc added 2 commits June 5, 2025 17:45

Add files via upload

6fa24f8

Add files via upload

779a6e9

KennethEnevoldsen reviewed Jun 5, 2025

View reviewed changes

mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Outdated Show resolved Hide resolved

mteb/benchmarks/benchmarks.py Show resolved Hide resolved

ll0ruc added 2 commits June 5, 2025 19:37

Update R2MEDRetrieval.py

d42647a

Add files via upload

f241034

ll0ruc added 3 commits June 5, 2025 20:44

Add files via upload

314c47f

Add files via upload

5e0edde

Add files via upload

0f6f2ac

KennethEnevoldsen changed the title ~~Add R2MED Benchmark~~ fix: Add R2MED Benchmark Jun 5, 2025

KennethEnevoldsen enabled auto-merge (squash) June 5, 2025 13:23

KennethEnevoldsen and others added 2 commits June 5, 2025 15:56

format citations

644adc9

Update R2MEDRetrieval.py

b1af5cf

auto-merge was automatically disabled June 5, 2025 15:24
Head branch was pushed to by a user without write access

Samoed changed the title ~~fix: Add R2MED Benchmark~~ dataset: Add R2MED Benchmark Jun 5, 2025

KennethEnevoldsen merged commit 631b4ef into embeddings-benchmark:main Jun 9, 2025
4 of 9 checks passed

KennethEnevoldsen mentioned this pull request Jun 9, 2025

dataset: Add R2MED Benchmark #2795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dataset: Add R2MED Benchmark #2740

dataset: Add R2MED Benchmark #2740

Uh oh!

ll0ruc commented May 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

KennethEnevoldsen commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 5, 2025

Uh oh!

ll0ruc commented Jun 5, 2025

Uh oh!

KennethEnevoldsen commented Jun 5, 2025

Uh oh!

ll0ruc commented Jun 5, 2025

Uh oh!

KennethEnevoldsen commented Jun 9, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 9, 2025

Uh oh!

Uh oh!

dataset: Add R2MED Benchmark #2740

dataset: Add R2MED Benchmark #2740

Uh oh!

Conversation

ll0ruc commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Uh oh!

KennethEnevoldsen commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 5, 2025

Uh oh!

ll0ruc commented Jun 5, 2025

Uh oh!

KennethEnevoldsen commented Jun 5, 2025

Uh oh!

ll0ruc commented Jun 5, 2025

Uh oh!

KennethEnevoldsen commented Jun 9, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 9, 2025

Uh oh!

Uh oh!

ll0ruc commented May 30, 2025 •

edited

Loading