Skip to content

Conversation

Samoed
Copy link
Member

@Samoed Samoed commented Jun 15, 2025

Continue ideas from #2537. Make all statistics in format

{
    "text_statistics": {"min_text_length": 1, ... },
    "label_statistics": {"min_labels_per_text": 1, ...},
}

For now, I haven't changed MIEB statistics

@Samoed Samoed added the v2 Issues and PRs related to `v2` branch label Jun 15, 2025
@Samoed Samoed requested a review from KennethEnevoldsen June 15, 2025 20:58
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - we might discuss what are relevant image metrics, but the structue change it very decent

Comment on lines 42 to 49
return ImageStatistics(
min_image_width=min(img_widths),
average_image_width=sum(img_widths) / len(img_widths),
max_image_width=max(img_widths),
min_image_height=min(img_heights),
average_image_height=sum(img_heights) / len(img_heights),
max_image_height=max(img_heights),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if these are the statistics that people would normally be interested in?

Should we add duplicates as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it as it was previously in MIEB. I think we can expand it, but I'm not sure what to add. CC @isaac-chung

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like duplicates - otherwise, I don't think I have a lot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was previously in MIEB is fine

@Samoed Samoed marked this pull request as ready for review July 5, 2025 15:42
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - we still need to resolve the conflicts though. I think we can add image metrics in another PR if needed

@isaac-chung any image metrics that we need to add?

@isaac-chung
Copy link
Collaborator

Looks good - we still need to resolve the conflicts though. I think we can add image metrics in another PR if needed

@isaac-chung any image metrics that we need to add?

Can't think of any extra from the MIEB existing ones. Think this PR just needs to resolve the conflicts and that's it.

Samoed added 4 commits July 9, 2025 23:03
# Conflicts:
#	mteb/abstasks/AbsTaskAnyClassification.py
#	mteb/abstasks/AbsTaskClustering.py
#	mteb/abstasks/AbsTaskPairClassification.py
#	tests/test_benchmark/mock_tasks.py
@Samoed Samoed merged commit 29a9228 into v2.0.0 Jul 12, 2025
8 checks passed
@Samoed Samoed deleted the refactor_descriptive_stats branch July 12, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2 Issues and PRs related to `v2` branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants