Add Classification Evaluator unit test #2838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

Samoed merged 8 commits into embeddings-benchmark:main from fzowl:main

Jul 15, 2025

Contributor

fzowl commented Jun 20, 2025

First step in resolving the #1955 issue.


          Adding Classification Evaluator test

33097f7

fzowl changed the title ~~Adding Classification Evaluator test~~ Add Classification Evaluator unit test

KennethEnevoldsen reviewed

View reviewed changes

Contributor

KennethEnevoldsen left a comment

Thanks for the PR!

Generally I think this looks really well, but I think we can simplify it down a bit.

In v2 we combine these into a single ClassificationEvaluator (to prevent discrepancies across), which has a classifier (sklearn interface) attached. This classifier will by default be logistic, but in principle you could change it out with any sklearn compatible classifier.

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

tests/test_evaluators/test_ClassificationEvaluator.py Outdated

+                          ("eval_logreg_multiclass", False),
+                      ],
+                  )
+                  def test_output_structure(self, evaluator_fixture, is_binary, model, request):

Contributor

KennethEnevoldsen Jun 23, 2025

I would probably change this to:

Suggested change

      
                def test_output_structure(self, evaluator_fixture, is_binary, model, request):
          
                def test_binary_output_structure(self, evaluator, model, x_train: list[str], y_train: list[int], x_test, x_train, expected_score: float):

That way, you can parametrize both the model and the evaluator - you will initialize the evaluator a few more times than you do now, but this way it is very easy to add new test cases. You can also extract whether the task is binary or not from the labels.

Note that it might be easier to define:

ClassificationTestCase:
  x_train: list[str]
  y_train: list[int]
  x_test
  x_train
  expected_score: float

# which leads to:  
def test_binary_output_structure(self, evaluator, model, testcase: ClassificationTestCase):

Contributor

KennethEnevoldsen Jun 23, 2025

We naturally can't check for the expected output here with the random MockNumpyEncoder(), however, changing it to:

class MockNumpyEncoder(mteb.Encoder):
    def __init__(self):
        self.rng_state = np.random.default_rng(42)

    def encode(self, sentences, prompt_name: str | None = None, **kwargs):
        return self.rng_state.random.rand(len(sentences), 10)

should fix the issue and should not introduce new issues.

Contributor Author

fzowl Jun 26, 2025

@KennethEnevoldsen I refactored this a bit, can you please take a look?

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved


          Modifications due to the comments

2e5d580

fzowl requested a review from KennethEnevoldsen

June 26, 2025 16:54

Contributor Author

fzowl commented Jun 26, 2025

@KennethEnevoldsen Thank you for the review! Can you please take another look now if the test is better this time?

KennethEnevoldsen approved these changes

View reviewed changes

Contributor

KennethEnevoldsen left a comment

Looking a lot better - Added a few simplifications to make it simpler

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved

fzowl and others added 4 commits

July 8, 2025 22:21


          Merge branch 'embeddings-benchmark:main' into main

d9c3031


          Update tests/test_evaluators/test_ClassificationEvaluator.py

da24458

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>


          Update tests/test_evaluators/test_ClassificationEvaluator.py

1321cbf

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>


          Modifications due to the comments

00e72e0

Contributor Author

fzowl commented Jul 9, 2025

@KennethEnevoldsen Can you please take a look now?

fzowl requested a review from KennethEnevoldsen

July 9, 2025 16:51

Samoed reviewed

View reviewed changes

tests/test_evaluators/test_ClassificationEvaluator.py Outdated Show resolved Hide resolved


          Modifications due to the comments

fzowl requested a review from Samoed

July 14, 2025 21:51


          Merge branch 'embeddings-benchmark:main' into main

d253025

Samoed approved these changes

View reviewed changes

Samoed merged commit 4a47f90 into embeddings-benchmark:main

9 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet