-
Notifications
You must be signed in to change notification settings - Fork 463
fix: Seperate model implementations and encoder specifications #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Seperate model implementations and encoder specifications #2879
Conversation
- Move all implementations into a seperate folder called `model_implementations` - moved `encoder_interface.py` and `model_meta.py` into `models` - renamed `models/*` to `encoder_implementions/*` to make the distinction from between the two folder clear - merged `models/utils.py` into the only model that used it We seems to have a few differing names when referring to a model (ModelMeta, get_model, etc.) and encoders (Encoder, AbsEncoder). Should we try to do something about this or just leave it as is? There is also an inconsistency between how tasks and implementations are in seperate folders, but for benchmark this is not the case. We could convert it to: ``` benchmarks/tasks/models | - implementations/* | - ... # definitions utilities etc. ``` But I am not sure it is worth it and for tasks it might be too much nesting. So I would probably leave it as is. Note: There is a few refactors that I would like to do on top of this, but will add that to seperate PR (since it is too hard to review here) Fixes #2299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reorganizes model-related code by moving implementation files into a dedicated model_implementations
folder, relocating the encoder interface and metadata definitions under mteb/models
, and updating all import paths accordingly.
- All model implementations are now under
mteb/models/model_implementations
encoder_interface.py
andmodel_meta.py
have been moved intomteb/models
and imports throughout the codebase were updated- The helper module
models/utils.py
was removed in favor of inlining batching logic where needed
Reviewed Changes
Copilot reviewed 134 out of 135 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
tests/test_tasks/test_mteb_rerank.py | Updated import of ModelMeta to new module location |
scripts/generate_metadata.py | Adjusted import path for ModelMeta , but left placeholder comment |
mteb/models/model_meta.py | Changed relative import for SIMILAR_TASKS |
mteb/models/encoder_interface.py | Removed Corpus type alias |
Comments suppressed due to low confidence (2)
scripts/generate_metadata.py:11
- [nitpick] The placeholder comment
# <-- Replace with the actual import path
is now outdated since the import has been corrected. Please remove or update the comment to reflect the current location ofModelMeta
.
from mteb.models.model_meta import ModelMeta, ScoringFunction
mteb/models/encoder_interface.py:3
- [nitpick] You removed the
Corpus
type alias. Verify that no existing references rely onCorpus
; if there are, consider reintroducing the alias or updating those references accordingly.
from typing import Any, Protocol, runtime_checkable
ModelMeta, get_model are separate things. What do you mean here? |
Otherwise, looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think in future when we solve #2728 we can separate encoder implementation and cross-encoder (SearchInterface)
Yeah did mean model_implementations (used encoder implementations earlier) So let us just leave the naming as is and then take a look at it later |
model_implementations
encoder_interface.py
andmodel_meta.py
intomodels
models/*
toencoder_implementions/*
to make the distinction from between the two folder clearmodels/utils.py
into the only model that used itWe seems to have a few differing names when referring to a model (ModelMeta, get_model, etc.) and encoders (Encoder, AbsEncoder). Should we try to do something about this or just leave it as is?
Note: There is a few refactors that I would like to do on top of this, but will add that to seperate PR (since it is too hard to review here)
Fixes