-
Notifications
You must be signed in to change notification settings - Fork 462
[v2] Introduce AbsTaskAnyClustering #2880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the legacy clustering task abstractions with the new unified AbsTaskAnyClustering
, updates all clustering tasks (text and image) to inherit from it, refactors the evaluator to handle both modalities, and removes the old image‐only clustering evaluator and base classes.
- Introduce
AbsTaskAnyClustering
and deprecateAbsTaskClustering
/AbsTaskImageClustering
- Update all clustering task classes to extend
AbsTaskAnyClustering
, addinginput_column_name
/label_column_name
as needed - Refactor
ClusteringEvaluator
to support image/text modalities and remove specializedImageClusteringEvaluator
Reviewed Changes
Copilot reviewed 55 out of 55 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
mteb/abstasks/AbsTaskAnyClustering.py |
New unified clustering base; duplicate label counting logic needs review |
mteb/tasks/Image/ImageClustering/eng/CIFAR.py |
Missing input_column_name override for CIFAR tasks |
mteb/tasks/Image/ImageClustering/eng/TinyImageNet.py |
Correctly adds input_column_name /label_column_name overrides |
mteb/evaluation/evaluators/ClusteringEvaluator.py |
Updated to support image/text, import paths correct |
mteb/abstasks/__init__.py |
Imports updated to include new clustering base |
Multiple mteb/tasks/Clustering/... files |
Updated to inherit from AbsTaskAnyClustering |
Removal of AbsTaskClustering.py and AbsTaskImageClustering.py |
Old bases and image evaluator removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run some tasks to compare results with main?
Yep! On it. |
Looked at v_measures for some image and text tasks from v1. Note that ClusteringFast was not implemented in MIEB so that's not been touched in this PR.
|
Fixes #2434
If you add a model or a dataset, please add the corresponding checklist: