-
Notifications
You must be signed in to change notification settings - Fork 461
Add Classification Evaluator unit test #2838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Generally I think this looks really well, but I think we can simplify it down a bit.
In v2 we combine these into a single ClassificationEvaluator
(to prevent discrepancies across), which has a classifier (sklearn interface) attached. This classifier will by default be logistic, but in principle you could change it out with any sklearn compatible classifier.
("eval_logreg_multiclass", False), | ||
], | ||
) | ||
def test_output_structure(self, evaluator_fixture, is_binary, model, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably change this to:
def test_output_structure(self, evaluator_fixture, is_binary, model, request): | |
def test_binary_output_structure(self, evaluator, model, x_train: list[str], y_train: list[int], x_test, x_train, expected_score: float): |
That way, you can parametrize both the model and the evaluator - you will initialize the evaluator a few more times than you do now, but this way it is very easy to add new test cases. You can also extract whether the task is binary or not from the labels.
Note that it might be easier to define:
ClassificationTestCase:
x_train: list[str]
y_train: list[int]
x_test
x_train
expected_score: float
# which leads to:
def test_binary_output_structure(self, evaluator, model, testcase: ClassificationTestCase):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We naturally can't check for the expected output here with the random MockNumpyEncoder()
, however, changing it to:
class MockNumpyEncoder(mteb.Encoder):
def __init__(self):
self.rng_state = np.random.default_rng(42)
def encode(self, sentences, prompt_name: str | None = None, **kwargs):
return self.rng_state.random.rand(len(sentences), 10)
should fix the issue and should not introduce new issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KennethEnevoldsen I refactored this a bit, can you please take a look?
@KennethEnevoldsen Thank you for the review! Can you please take another look now if the test is better this time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking a lot better - Added a few simplifications to make it simpler
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
@KennethEnevoldsen Can you please take a look now? |
First step in resolving the #1955 issue.