-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Similar to the cross-encoder Score API proposed here: #5577
Goal is to score items "generatively" using decoder-only models.
E.g. "Given a user liked A, B, and C, will the user like this item? Please answer "yes" or "no." The item is: D"
API
{
"text_1": [
"Given a user liked A, B, and C, will the user like this item? Please answer "yes" or "no." The item is:",
],
"text_2": [
"D",
"E"
],
"positiveToken": "yes",
"negativeToken": "no"
}
Returns:
{
"scores": [
0.874,
0.231
]
}
Related resources
Original idea comes from this paper: Holistic Evaluation of Language Models which states the following:
We address the re-ranking task in a pointwise fashion: we formulate the information
retrieval problem using prompting as a binary log-probability problem, similar to Nogueira & Cho (2019):
Given a passage ci and a query q, we ask the model whether the passage contains an answer to the query. If
the model’s answer is Yes with a high probability, we rank the corresponding ci higher, while the No answer
with high probability achieves the opposite. Figure 12 depicts an example instance. The rankings produced
are then evaluated using standard information retrieval metrics
A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE https://arxiv.org/html/2403.10407v1
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations https://proceedings.mlr.press/v235/zhai24a.html
More docs to be added
woodx9