Skip to content

Conversation

isaac-chung
Copy link
Collaborator

Fixes #2434

If you add a model or a dataset, please add the corresponding checklist:

@isaac-chung isaac-chung changed the base branch from main to v2.0.0 July 5, 2025 13:55
@isaac-chung isaac-chung changed the title Introduce abs task any clustering [v2] Introduce AbsTaskAnyClustering Jul 5, 2025
@isaac-chung isaac-chung requested a review from Copilot July 5, 2025 13:57
Copilot

This comment was marked as outdated.

@isaac-chung isaac-chung requested a review from Copilot July 5, 2025 14:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the legacy clustering task abstractions with the new unified AbsTaskAnyClustering, updates all clustering tasks (text and image) to inherit from it, refactors the evaluator to handle both modalities, and removes the old image‐only clustering evaluator and base classes.

  • Introduce AbsTaskAnyClustering and deprecate AbsTaskClustering/AbsTaskImageClustering
  • Update all clustering task classes to extend AbsTaskAnyClustering, adding input_column_name/label_column_name as needed
  • Refactor ClusteringEvaluator to support image/text modalities and remove specialized ImageClusteringEvaluator

Reviewed Changes

Copilot reviewed 55 out of 55 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mteb/abstasks/AbsTaskAnyClustering.py New unified clustering base; duplicate label counting logic needs review
mteb/tasks/Image/ImageClustering/eng/CIFAR.py Missing input_column_name override for CIFAR tasks
mteb/tasks/Image/ImageClustering/eng/TinyImageNet.py Correctly adds input_column_name/label_column_name overrides
mteb/evaluation/evaluators/ClusteringEvaluator.py Updated to support image/text, import paths correct
mteb/abstasks/__init__.py Imports updated to include new clustering base
Multiple mteb/tasks/Clustering/... files Updated to inherit from AbsTaskAnyClustering
Removal of AbsTaskClustering.py and AbsTaskImageClustering.py Old bases and image evaluator removed

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run some tasks to compare results with main?

@isaac-chung
Copy link
Collaborator Author

Yep! On it.

@isaac-chung
Copy link
Collaborator Author

Looked at v_measures for some image and text tasks from v1. Note that ClusteringFast was not implemented in MIEB so that's not been touched in this PR.

PR v2.0.0
CIFAR10Clustering 0.6925 0.6925
TinyImageNetClustering 0.6262 0.6262
WikiCitiesClustering 0.8213 0.8213

@isaac-chung isaac-chung merged commit 4479e12 into v2.0.0 Jul 6, 2025
8 checks passed
@isaac-chung isaac-chung deleted the introduce-AbsTaskAnyClustering branch July 6, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor MIEB Clustering
2 participants