dataset: Add IFIR benchmark #2815

SighingSnow · 2025-06-12T07:43:05Z

I have outlined why this dataset is filling an existing gap in mteb
I have tested that the dataset runs with the mteb package.
I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-smal
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).

SighingSnow · 2025-06-12T07:44:51Z

Could anyone help me to review my code. If the code actually goes well, I will further run some evaluation and provide some results on this benchmark.
Also, I will close the previous PR as it's outdated.

Samoed · 2025-06-12T15:12:12Z

mteb/tasks/InstructionRetrieval/eng/IFIRAilaRetrieval.py

+        scores_dict = {"level_1": [], "level_2": [], "level_3": []}
+        for k, v in scores.items():
+            if "v1" in k:
+                scores_dict["level_1"].append(v["ndcg_cut_20"])


Suggested change

scores_dict["level_1"].append(v["ndcg_cut_20"])

scores_dict["level_1"].append(v[self.metadata.main_score])

Samoed · 2025-06-12T15:12:27Z

mteb/tasks/InstructionRetrieval/eng/IFIRAilaRetrieval.py

+        task_subtypes=None,
+        license=None,
+        annotations_creators=None,
+        dialect=None,
+        sample_creation=None,


Can you fill missing metadata?

Oh, no problem! Thanks for pointing the errors!

Also you should calculate metadata of your tasks. You can do it by

import mteb benchmark = mteb.get_benchmark("IFIR") for task in benchmark.tasks: task.calculate_metadata_metrics()

And format bibtex citations by

python scripts/format_citations.py benchmarks python scripts/format_citations.py tasks

SighingSnow · 2025-06-13T11:09:26Z

I will further provide some evaluation results later. Could you please wait me for 1-3 days.

Samoed · 2025-06-27T07:46:11Z

mteb/tasks/InstructionRetrieval/eng/IFIRAilaRetrieval.py

@@ -0,0 +1,64 @@
+from __future__ import annotations
+
+from mteb.abstasks.TaskMetadata import TaskMetadata


I think you should merge v2 into your branch and change import to

Suggested change

from mteb.abstasks.TaskMetadata import TaskMetadata

from mteb.abstasks.task_metadata import TaskMetadata

Yes. Thank you for pointing that out. I have tried the v2.0.0 branch without my revision, but it will still fails. Is there any bug in the the v2.0.0 code.

I use the following command

git clone https://github.com/embeddings-benchmark/mteb.git git switch v2.0.0 pip install -e . python run_mteb.py

Then I will get the error from
https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/models/colpali_models.py#l11

@SighingSnow it seems like the error is still on the import (did you push the changes?), then we can take a look at the new error - Can you also do the lint?

it seems like the error is still on the import (did you push the changes?)

I just pushed a new version. But it seems that the error is not from IFIR relevant files.

Can you also do the lint?

Yes, of course. I have fixed the format in the new commit.

KennethEnevoldsen

Code + metadata looks fine to me

Signed-off-by: SighingSnow <songtingyu220@gmail.com>

isaac-chung · 2025-06-28T10:38:21Z

I see some force pushes after approval. @Samoed would you mind checking if this is ready to merge?

SighingSnow mentioned this pull request Jun 12, 2025

fix: Add IFIR relevant tasks #2763

Closed

6 tasks

Samoed changed the title ~~Add IFIR benchmark.~~ dataset: Add IFIR benchmark Jun 12, 2025

SighingSnow force-pushed the v2.0.0 branch from b3be89e to e41c061 Compare June 12, 2025 15:00

Samoed reviewed Jun 12, 2025

View reviewed changes

SighingSnow force-pushed the v2.0.0 branch from e41c061 to 792b98d Compare June 13, 2025 04:34

SighingSnow force-pushed the v2.0.0 branch from 792b98d to 7818aa8 Compare June 27, 2025 01:50

SighingSnow mentioned this pull request Jun 27, 2025

Add IFIR results embeddings-benchmark/results#228

Merged

6 tasks

Samoed approved these changes Jun 27, 2025

View reviewed changes

Samoed requested a review from KennethEnevoldsen June 27, 2025 07:33

Samoed reviewed Jun 27, 2025

View reviewed changes

SighingSnow force-pushed the v2.0.0 branch from 7818aa8 to 3e7877c Compare June 27, 2025 08:15

KennethEnevoldsen reviewed Jun 27, 2025

View reviewed changes

SighingSnow force-pushed the v2.0.0 branch 2 times, most recently from b44cbb3 to 7e368e2 Compare June 28, 2025 07:53

Add IFIR relevant tasks.

2065199

Signed-off-by: SighingSnow <songtingyu220@gmail.com>

SighingSnow force-pushed the v2.0.0 branch from 7e368e2 to 2065199 Compare June 28, 2025 10:29

Samoed approved these changes Jun 28, 2025

View reviewed changes

Samoed merged commit ed69e60 into embeddings-benchmark:v2.0.0 Jun 28, 2025
8 checks passed

SighingSnow deleted the v2.0.0 branch June 28, 2025 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dataset: Add IFIR benchmark #2815

dataset: Add IFIR benchmark #2815

Uh oh!

SighingSnow commented Jun 12, 2025

Uh oh!

SighingSnow commented Jun 12, 2025 •

edited

Loading

Uh oh!

Samoed Jun 12, 2025

Uh oh!

Samoed Jun 12, 2025

Uh oh!

SighingSnow Jun 12, 2025

Uh oh!

Samoed Jun 12, 2025 •

edited

Loading

Uh oh!

SighingSnow commented Jun 13, 2025

Uh oh!

Samoed Jun 27, 2025

Uh oh!

SighingSnow Jun 27, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen Jun 27, 2025

Uh oh!

SighingSnow Jun 27, 2025

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

isaac-chung commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

	scores_dict["level_1"].append(v["ndcg_cut_20"])
	scores_dict["level_1"].append(v[self.metadata.main_score])

		@@ -0,0 +1,64 @@
		from __future__ import annotations

		from mteb.abstasks.TaskMetadata import TaskMetadata

	from mteb.abstasks.TaskMetadata import TaskMetadata
	from mteb.abstasks.task_metadata import TaskMetadata

dataset: Add IFIR benchmark #2815

dataset: Add IFIR benchmark #2815

Uh oh!

Conversation

SighingSnow commented Jun 12, 2025

Uh oh!

SighingSnow commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

SighingSnow Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SighingSnow commented Jun 13, 2025

Uh oh!

Samoed Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

SighingSnow Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

SighingSnow Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

isaac-chung commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

SighingSnow commented Jun 12, 2025 •

edited

Loading

Samoed Jun 12, 2025 •

edited

Loading

SighingSnow Jun 27, 2025 •

edited

Loading