-
Notifications
You must be signed in to change notification settings - Fork 464
[v2] Two stage rerank #3040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Two stage rerank #3040
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm thanks for making the PR @Samoed.
I really don't like the save_retrieval_results=True
argument as it clearly a task specific argument.
I do however like that it is possible to create a reranking from a retrieval task. How about
import mteb
model = mteb.get_model("minishlab/potion-base-2M")
retrieval_task = mteb.get_task("NanoArguAnaRetrieval")
model = SaveRetrievaPredictionsWrapper(model)
res = mteb.evaluate(model, retrieval_task)
reranking_task = retrieval_task.convert_to_reranking(model.predictions, top_k=100)
model = mteb.get_model("jinaai/jina-reranker-v2-base-multilingual")
mteb.evaluate(model, reranking_task)
Alternatively, we could add a more general save_predictions
flag, that can be used for all tasks and just add a NotImplementedError() for all tasks but retrieval. I could see that as being an option as well. Besides the many not-implemented errors this would cause, it would also lead to a requirement to handle new arguments, such as where to save predictions, etc.
The Wrapper isolate the concern and leave the evalautor only focused on evaluation
@@ -187,12 +187,3 @@ def test_create_meta_from_existing( | |||
command = f"{sys.executable} -m mteb create_meta --results_folder {results.as_posix()} --output_path {output_path.as_posix()} --from_existing {existing_readme.as_posix()} --overwrite" | |||
result = subprocess.run(command, shell=True, capture_output=True, text=True) | |||
assert result.returncode == 0, "Command failed" | |||
|
|||
|
|||
def test_save_predictions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this still used by MTEB()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this still used by MTEB
, but I don't want to support it, because otherwise it would required to support a lot of extra parameters in AbsRetrieval
which we don't want I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will break MTEB()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've chedked a bit futher. It will break only cli for some part, becuase we don't support this flag for now, but we need to refactor CLI a bit to use mteb.evaluate
instead of MTEB
. We can add support of this flag to save backward compatibility. But this flag is not supported by MTEB
directly, it's just passed thought kwargs and not documented
Lines 254 to 266 in cf01bc8
def run( | |
self, | |
model: MTEBModels | CrossEncoder | SentenceTransformer, | |
verbosity: int = 1, | |
output_folder: str | None = "results", | |
eval_splits: list[str] | None = None, | |
eval_subsets: list[str] | None = None, | |
overwrite_results: bool = False, | |
raise_error: bool = True, | |
co2_tracker: bool = True, | |
encode_kwargs: dict[str, Any] | None = None, | |
**kwargs, | |
) -> list[TaskResult]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah indeed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let is keep it like this, but remove it from the CLI args/docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implement an additional method for save results, I think we can leave it as is
I defenetly agree, but I don't see another solution to the problem. And this is because I didn't want to add it implement firstly.
I don't think that this would solve the problems, because models can save only embedings, but not predictions for the retrieval task. I don't know how resuse results from the wrapper then.
I can make implementation for it, because this would solve problems for the tasks |
Why can't they save predicitons? doesn't the search interface return RetrievalOutputType?
Let us decide on a good solutions before spending the implementation time |
Oh, I forgot that we have search interface. Yes, wrapper should work then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. In general this looks fine to me.
As a meta-comment though, I will say I'm nervous we're adding too many wrappers. Do they stack? Can we save predictions and cache embeddings and ...?
Maybe a personal preference but I think arguments to functions are a much nicer UI from a users perspective than doing a bunch of wrappers. However, we have so many it's probably too late to change all of these at this point. Just worth noting for future extensions.
I thought about this, too. This will be like
In new Lines 162 to 171 in cf01bc8
|
mteb/models/search_wrappers.py
Outdated
def __init__( | ||
self, | ||
model: SearchProtocol, | ||
results_path: str | Path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So based on @orionw and @Samoed comment, I might be better to add the following arguments:
save_retrieval_predictions: bool
prediction_save_path: Path
And then in the future we can convert raise a depreciationwarning on save_retrieval_predictions
and instead refer to save_predictions
.
let me know what you think?
(sorry @Samoed i understand that this is a digression from before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might still want to keep the wrapper internally (without exposing it to the user)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then in the future we can convert raise a depreciationwarning on save_retrieval_predictions and instead refer to save_predictions.
Where do you want to add it? To mteb.evaluate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that would be the idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add method save_predictions
to all tasks then? And we can use only prediction_save_path: Path
then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but that required a lot of implementation across the different tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't like that we're passing task-specific args. Also, for some users this can be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it was simply that we don't have it implemented yet so I am afraid it would give a poor user experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will implement it for retrieval in this pr and will add to other tasks later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright let us do it this way (sorry for backtracking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you redo the example in the top. I think this format looks pretty good though. Few updates on naming things
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Updated example in description |
Looks great @Samoed - I think this is good to merge - we need to update some docs away from using MTEB() for two-stage retrieval, but I can do that in a separate PR |
I've created 2 stage reranking. Here is example how to run it:
I've made saving information about run results to result file for better reproduction.