-
Notifications
You must be signed in to change notification settings - Fork 464
dataset: Add IFIR benchmark #2815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could anyone help me to review my code. If the code actually goes well, I will further run some evaluation and provide some results on this benchmark. |
scores_dict = {"level_1": [], "level_2": [], "level_3": []} | ||
for k, v in scores.items(): | ||
if "v1" in k: | ||
scores_dict["level_1"].append(v["ndcg_cut_20"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scores_dict["level_1"].append(v["ndcg_cut_20"]) | |
scores_dict["level_1"].append(v[self.metadata.main_score]) |
task_subtypes=None, | ||
license=None, | ||
annotations_creators=None, | ||
dialect=None, | ||
sample_creation=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fill missing metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no problem! Thanks for pointing the errors!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you should calculate metadata of your tasks. You can do it by
import mteb
benchmark = mteb.get_benchmark("IFIR")
for task in benchmark.tasks:
task.calculate_metadata_metrics()
And format bibtex citations by
python scripts/format_citations.py benchmarks
python scripts/format_citations.py tasks
I will further provide some evaluation results later. Could you please wait me for 1-3 days. |
@@ -0,0 +1,64 @@ | |||
from __future__ import annotations | |||
|
|||
from mteb.abstasks.TaskMetadata import TaskMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should merge v2
into your branch and change import to
from mteb.abstasks.TaskMetadata import TaskMetadata | |
from mteb.abstasks.task_metadata import TaskMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thank you for pointing that out. I have tried the v2.0.0 branch without my revision, but it will still fails. Is there any bug in the the v2.0.0 code.
I use the following command
git clone https://github.com/embeddings-benchmark/mteb.git
git switch v2.0.0
pip install -e .
python run_mteb.py
Then I will get the error from
https://github.com/embeddings-benchmark/mteb/blob/v2.0.0/mteb/models/colpali_models.py#l11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SighingSnow it seems like the error is still on the import (did you push the changes?), then we can take a look at the new error - Can you also do the lint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like the error is still on the import (did you push the changes?)
I just pushed a new version. But it seems that the error is not from IFIR relevant files.
Can you also do the lint?
Yes, of course. I have fixed the format in the new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code + metadata looks fine to me
b44cbb3
to
7e368e2
Compare
Signed-off-by: SighingSnow <songtingyu220@gmail.com>
I see some force pushes after approval. @Samoed would you mind checking if this is ready to merge? |
I have outlined why this dataset is filling an existing gap in mteb
I have tested that the dataset runs with the mteb package.
I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).