-
Notifications
You must be signed in to change notification settings - Fork 463
[v2] Combine instructions with queries #2984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Combine instructions with queries #2984
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming that the eval still runs and looks the same.
This must create rows that have duplicate queries (so query size for InstructIR will 10x to be each combination of instruction + query instead of just queries). But that's fine to me since they are relatively small datasets.
For Also in docstring we have that we're alredy need to duplicate queries to match with instruction mteb/mteb/abstasks/AbsTaskRetrieval.py Line 95 in 64478e7
|
I see, thanks for pointing out. I think the FollowIR and mFollowIR tasks don't require duplication and the only other Instruction* task I see is IFIR, which seems to only have one instruction per query. So I think we're good then? |
I think yes. Can you also review #2970. I have questions here about top_ranked and how to integrate cross-encoders |
…ons_to_query2 # Conflicts: # mteb/abstasks/AbsTaskRetrieval.py # mteb/models/search_wrappers.py
Looks good to me as well |
* change corpus and queries to dataset * remove commented out code * add convertion for v1 datasets * fix descriptive stats * update reranking * format * fix tests * lint * change ids of mock dataset * change score for colbert * add type for corpus and queries datasets * fix reranking task * format * update push to hub * update statistics calculation * simplify `create_dataloader_for_retrieval_corpus` * remove check with queries id * add instruction dataset type * fully annotate retrieval types * remove irrelevant type annotation * init * base search interface implementation * base search interface implementation * add todo comment * add link to todo * Update mteb/models/search/search_crossencoder.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update mteb/create_dataloaders.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove search folder * fix imports * fix tests * add support for cross encoder models * combine back encoder * add additional check for interface * resolve copilot comment * fix union type * roll back rename in validate_task_to_prompt_name * fix descriptive stats * [v2] Combine instructions with queries (#2984) * combine instructions with queries * fix old format ds * rename `MtebSupportedModelProtocols` and add `RetrievalEvaluationResult` tuple --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Close #2969
I've merged
instruction
andqueries
datasets. Now queries dataset would look likeFor now, 3 tests are failing because of change in
SearchInterface
. I tried to runmFollowIR
andCore17InstructionRetrieval
and they have the same results as inv2
branch.