-
Notifications
You must be signed in to change notification settings - Fork 463
[v2] New encoder interface #2415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # mteb/evaluation/evaluators/Image/ClassificationEvaluator.py # mteb/evaluation/evaluators/PairClassificationEvaluator.py # mteb/model_meta.py
@isaac-chung @KennethEnevoldsen Can you review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of good stuff here - sorry the second part of the review came so long after the first comments (the GitHub UI is simply excessively slow when working with large diffs - this is also why some of the comments are a bit terse)
There are 18 model groups and some of them are difficult to run and some are too big for me to run. This PR will not change much for the models, I think we can check them later, or can you provide top 5 models for us to check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think this one is ready to merge
Hmm, if you can run every model in this screenshot that is runnable, it's probably a good enough check. Thanks again in advance! CC @gowitheflow-1998 this is a change in v2. |
Did you try this? |
Ah yes, yours is correct. e5-v is at the top. |
@orionw Hope you’ve successfully submitted to COLM! Could you take a look at the PR when you have a moment? Regarding the |
Thanks for the ping @Samoed, sorry for the delay. It seems like this is the same interface that the cross-encoders currently have: namely the I have missed a lot of the conversation here so apologies if this has been hashed out: the only thing I think is confusing is we lump cross-encoders into the same class as the bi-encoders (also same with the base Otherwise it looks good and complete w.r.t. cross-encoders! |
Yes, I understand your point, but we haven't implemented the cross-encoder interface anywhere yet, and it would be difficult to check each time if a model is a cross-encoder. I have made a change based on the |
@isaac-chung I've tested models. Results from this PR:
Results from leaderboard:
|
Odd. I was able to get 'main_score': 0.55203 on |
Yes, I'm getting similar score |
I got 'main_score': 0.55457 from |
For CVBench I agree, but what's for |
We previously overwrote LB scores to use |
That should be it, thanks so much, @Samoed ! |
Scores after main merge are matching for
|
Closes #1735
Wrapper
toAbsEncoder
types
fileAll tasks now use the model's similarity implementation cosine by default. This allows for new approaches for similarity, but does mean that scores will change:
facebook/contriever-msmarco
, that was added inv2
For now, I don't know what to do with
CrossEncoders
. Should we create another interface for them?Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.Adding a model checklist
mteb.get_model(model_name, revision)
andmteb.get_model_meta(model_name, revision)