Skip to content

[v2] Merge main #2617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 196 commits into from
May 3, 2025
Merged

[v2] Merge main #2617

merged 196 commits into from
May 3, 2025

Conversation

Samoed
Copy link
Member

@Samoed Samoed commented May 2, 2025

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

Samoed and others added 30 commits April 5, 2025 12:34
* refactor eval langs test

* function returns None

* add hard negaties tasks in _HISTORIC_DATASETS
* rename folder

* trailing spaces

* missed one
Automatically generated by python-semantic-release
* fix gradio leaderboard run

* update docs
* fix hatefulmeme

* add to description and use polars instead

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* conan_models

* conan_models

* refactor code

* refactor code

---------

Co-authored-by: shyuli <shyuli@tencent.com>
…lude aggregate tasks (#2536)

* Implement task.is_aggregate check

* Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed

* Update mteb.run with the new `task.is_aggregate` parameter

* Add tests

* Ran linter

* Changed logic to `exclude_aggregate`

* Updated from review comments

* Exclude aggregate by default false in get_tasks
Automatically generated by python-semantic-release
Add MIEB citation in benchmarks
* [ADD] 2 new Datasets

* [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO

* [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO
* feat: CacheWrapper per task

* refactor logic

* update documentation

---------

Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>
Automatically generated by python-semantic-release
move mmteb scripts and notebooks to separate repo
fix: Update package requirements in JinaWrapper for einops and flash_attn
Automatically generated by python-semantic-release
* defined model metadata for xlm_roberta_ua_distilled

* Update mteb/models/ua_sentence_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* included ua_sentence_models.py in overview.py

* applied linting, added missing fields in ModelMeta

* applied linting

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* fix: me5 trainind data config to include xquad dataset

* Update mteb/models/e5_models.py

upddate: xquad key name

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: ME5_TRAINING_DATA format

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* fix: Added dataframe utilities to BenchmarkResults

- Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT?
- Added a tests for ModelResults and BenchmarksResults
- Added a few utility functions where needed
- Added docstring throughout ModelResults and BenchmarksResults
- Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then.

Prerequisite for #2454:

@ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right.

* refactor to to_dataframe and combine common dependencies

* ibid

* fix revision joining after discussion with @x-tabdeveloping

* remove strict=True for zip() as it is a >3.9 feature

* updated mock cache
github-actions bot and others added 9 commits May 2, 2025 04:46
# Conflicts:
#	docs/tasks.md
#	mteb/abstasks/AbsTaskSpeedTask.py
#	mteb/abstasks/TaskMetadata.py
#	mteb/encoder_interface.py
#	mteb/models/misc_models.py
#	mteb/models/ops_moa_models.py
#	mteb/models/ru_sentence_models.py
#	mteb/models/sentence_transformers_models.py
#	mteb/tasks/Classification/__init__.py
#	mteb/tasks/Clustering/deu/BlurbsClusteringP2P.py
#	mteb/tasks/Clustering/deu/BlurbsClusteringS2S.py
#	mteb/tasks/Clustering/deu/TenKGnadClusteringS2S.py
#	mteb/tasks/Clustering/fra/AlloProfClusteringP2P.py
#	mteb/tasks/Clustering/fra/AlloProfClusteringS2S.py
#	mteb/tasks/Clustering/fra/HALClusteringS2S.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py
#	mteb/tasks/Image/Clustering/eng/__init__.py
#	mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py
#	mteb/tasks/Image/ImageClassification/eng/CIFAR.py
#	mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/DTDClassification.py
#	mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py
#	mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py
#	mteb/tasks/Image/ImageClassification/eng/Food101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py
#	mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py
#	mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py
#	mteb/tasks/Image/ImageClassification/eng/STL10Classification.py
#	mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py
#	mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py
#	mteb/tasks/Image/ImageClustering/eng/CIFAR.py
#	mteb/tasks/Image/ImageClustering/eng/ImageNet.py
#	mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py
#	mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py
#	mteb/tasks/Image/VisualSTS/__init__.py
#	mteb/tasks/Image/VisualSTS/en/__init__.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py
#	mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/DTD.py
#	mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Food101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py
#	mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py
#	mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py
#	mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py
#	mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py
#	mteb/tasks/Image/ZeroShotClassification/eng/STL10.py
#	mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py
#	mteb/tasks/Image/__init__.py
#	mteb/tasks/MultiLabelClassification/__init__.py
#	mteb/tasks/Reranking/zho/CMTEBReranking.py
#	mteb/tasks/Retrieval/__init__.py
#	mteb/tasks/__init__.py
#	pyproject.toml
#	tests/test_TaskMetadata.py
@Samoed Samoed added the v2 Issues and PRs related to `v2` branch label May 2, 2025
@Samoed Samoed requested a review from isaac-chung May 2, 2025 06:34
Samoed and others added 17 commits May 2, 2025 09:37
* Update gradio version

Closes #2557

* bump gradio
We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.
* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* add scandisent dataset

* add to init

* typo
Automatically generated by python-semantic-release
* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script
# Conflicts:
#	mteb/benchmarks/benchmarks.py
#	mteb/tasks/Classification/__init__.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py
#	mteb/tasks/Image/Any2AnyMultiChoice/eng/CVBench.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesI2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionI2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRI2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/TUBerlinT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/multilingual/WITT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/multilingual/XFlickr30kCoT2IRetrieval.py
#	mteb/tasks/Image/Any2AnyRetrieval/multilingual/XM3600T2IRetrieval.py
#	mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py
#	mteb/tasks/Image/ImageClassification/eng/CIFAR.py
#	mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/Country211Classification.py
#	mteb/tasks/Image/ImageClassification/eng/DTDClassification.py
#	mteb/tasks/Image/ImageClassification/eng/EuroSATClassification.py
#	mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py
#	mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py
#	mteb/tasks/Image/ImageClassification/eng/Food101Classification.py
#	mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py
#	mteb/tasks/Image/ImageClassification/eng/Imagenet1k.py
#	mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py
#	mteb/tasks/Image/ImageClassification/eng/OxfordFlowersClassification.py
#	mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py
#	mteb/tasks/Image/ImageClassification/eng/STL10Classification.py
#	mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py
#	mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py
#	mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py
#	mteb/tasks/Image/ImageClustering/eng/CIFAR.py
#	mteb/tasks/Image/ImageClustering/eng/ImageNet.py
#	mteb/tasks/Image/ImageClustering/eng/TinyImageNet.py
#	mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py
#	mteb/tasks/Image/ImageTextPairClassification/AROCocoOrder.py
#	mteb/tasks/Image/ImageTextPairClassification/AROFlickrOrder.py
#	mteb/tasks/Image/ImageTextPairClassification/AROVisualAttribution.py
#	mteb/tasks/Image/ImageTextPairClassification/AROVisualRelation.py
#	mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py
#	mteb/tasks/Image/ImageTextPairClassification/SugarCrepe.py
#	mteb/tasks/Image/ImageTextPairClassification/Winoground.py
#	mteb/tasks/Image/VisualSTS/eng/STS12VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS13VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS14VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS15VisualSTS.py
#	mteb/tasks/Image/VisualSTS/eng/STS16VisualSTS.py
#	mteb/tasks/Image/VisualSTS/multilingual/STS17MultilingualVisualSTS.py
#	mteb/tasks/Image/VisualSTS/multilingual/STSBenchmarkMultilingualVisualSTS.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py
#	mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/CLEVR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Country211.py
#	mteb/tasks/Image/ZeroShotClassification/eng/DTD.py
#	mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py
#	mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Food101.py
#	mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py
#	mteb/tasks/Image/ZeroShotClassification/eng/Imagenet1k.py
#	mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py
#	mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py
#	mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py
#	mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py
#	mteb/tasks/Image/ZeroShotClassification/eng/RenderedSST2.py
#	mteb/tasks/Image/ZeroShotClassification/eng/STL10.py
#	mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py
#	mteb/tasks/Image/ZeroShotClassification/eng/SciMMIR.py
#	mteb/tasks/Image/ZeroShotClassification/eng/StanfordCars.py
#	mteb/tasks/Image/ZeroShotClassification/eng/UCF101.py
#	mteb/tasks/PairClassification/fas/FaMTEBPairClassification.py
#	mteb/tasks/PairClassification/multilingual/XNLI.py
#	mteb/tasks/Retrieval/ara/SadeemQuestionRetrieval.py
#	mteb/tasks/Retrieval/multilingual/PublicHealthQARetrieval.py
#	mteb/tasks/Retrieval/pol/FiQAPLRetrieval.py
#	mteb/tasks/Retrieval/zho/CMTEBRetrieval.py
#	pyproject.toml
@Samoed Samoed merged commit 1e56329 into v2.0.0 May 3, 2025
9 checks passed
@Samoed Samoed deleted the merge_main branch May 3, 2025 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2 Issues and PRs related to `v2` branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.