-
Notifications
You must be signed in to change notification settings - Fork 461
[v2] Merge main #2617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[v2] Merge main #2617
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update README.md
* refactor eval langs test * function returns None * add hard negaties tasks in _HISTORIC_DATASETS
rename folder
* rename folder * trailing spaces * missed one
* fix gradio leaderboard run * update docs
specify only the multilingual AggTask
* fix hatefulmeme * add to description and use polars instead --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* conan_models * conan_models * refactor code * refactor code --------- Co-authored-by: shyuli <shyuli@tencent.com>
…lude aggregate tasks (#2536) * Implement task.is_aggregate check * Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed * Update mteb.run with the new `task.is_aggregate` parameter * Add tests * Ran linter * Changed logic to `exclude_aggregate` * Updated from review comments * Exclude aggregate by default false in get_tasks
Add MIEB citation in benchmarks
* [ADD] 2 new Datasets * [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO * [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO
* feat: CacheWrapper per task * refactor logic * update documentation --------- Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>
move mmteb scripts and notebooks to separate repo
fix: Update package requirements in JinaWrapper for einops and flash_attn
Add MIEB to README
* defined model metadata for xlm_roberta_ua_distilled * Update mteb/models/ua_sentence_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * included ua_sentence_models.py in overview.py * applied linting, added missing fields in ModelMeta * applied linting --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* fix: me5 trainind data config to include xquad dataset * Update mteb/models/e5_models.py upddate: xquad key name Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: ME5_TRAINING_DATA format --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* fix: Added dataframe utilities to BenchmarkResults - Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT? - Added a tests for ModelResults and BenchmarksResults - Added a few utility functions where needed - Added docstring throughout ModelResults and BenchmarksResults - Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then. Prerequisite for #2454: @ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right. * refactor to to_dataframe and combine common dependencies * ibid * fix revision joining after discussion with @x-tabdeveloping * remove strict=True for zip() as it is a >3.9 feature * updated mock cache
# Conflicts: # docs/tasks.md # mteb/abstasks/AbsTaskSpeedTask.py # mteb/abstasks/TaskMetadata.py # mteb/encoder_interface.py # mteb/models/misc_models.py # mteb/models/ops_moa_models.py # mteb/models/ru_sentence_models.py # mteb/models/sentence_transformers_models.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Clustering/deu/BlurbsClusteringP2P.py # mteb/tasks/Clustering/deu/BlurbsClusteringS2S.py # mteb/tasks/Clustering/deu/TenKGnadClusteringS2S.py # mteb/tasks/Clustering/fra/AlloProfClusteringP2P.py # mteb/tasks/Clustering/fra/AlloProfClusteringS2S.py # mteb/tasks/Clustering/fra/HALClusteringS2S.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py # mteb/tasks/Image/Clustering/eng/__init__.py # mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py # mteb/tasks/Image/ImageClassification/eng/CIFAR.py # mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py # mteb/tasks/Image/ImageClassification/eng/DTDClassification.py # mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py # mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py # mteb/tasks/Image/ImageClassification/eng/Food101Classification.py # mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py # mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py # mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py # mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py # mteb/tasks/Image/ImageClassification/eng/STL10Classification.py # mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py # mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py # mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py # mteb/tasks/Image/ImageClustering/eng/CIFAR.py # mteb/tasks/Image/ImageClustering/eng/ImageNet.py # mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py # mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py # mteb/tasks/Image/VisualSTS/__init__.py # mteb/tasks/Image/VisualSTS/en/__init__.py # mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py # mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py # mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py # mteb/tasks/Image/ZeroShotClassification/eng/DTD.py # mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py # mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py # mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py # mteb/tasks/Image/ZeroShotClassification/eng/Food101.py # mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py # mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py # mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py # mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py # mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py # mteb/tasks/Image/ZeroShotClassification/eng/STL10.py # mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py # mteb/tasks/Image/__init__.py # mteb/tasks/MultiLabelClassification/__init__.py # mteb/tasks/Reranking/zho/CMTEBReranking.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/__init__.py # pyproject.toml # tests/test_TaskMetadata.py
* Update gradio version Closes #2557 * bump gradio
We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.
* fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* add scandisent dataset * add to init * typo
* Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script
# Conflicts: # mteb/benchmarks/benchmarks.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2IMultiChoice.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/BLINKIT2TMultiChoice.py # mteb/tasks/Image/Any2AnyMultiChoice/eng/CVBench.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/BLINKIT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/CUB200I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/FORBI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/GLDv2I2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesI2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/HatefulMemesT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ImageCoDeT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/METI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionI2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/MemotionT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/ROxfordI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RP2kI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/RParisI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SOPI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRI2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SciMMIRT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/SketchyI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/StanfordCarsI2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/TUBerlinT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VQA2IT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/eng/VizWizIT2TRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/multilingual/WITT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/multilingual/XFlickr30kCoT2IRetrieval.py # mteb/tasks/Image/Any2AnyRetrieval/multilingual/XM3600T2IRetrieval.py # mteb/tasks/Image/ImageClassification/eng/BirdsnapClassification.py # mteb/tasks/Image/ImageClassification/eng/CIFAR.py # mteb/tasks/Image/ImageClassification/eng/Caltech101Classification.py # mteb/tasks/Image/ImageClassification/eng/Country211Classification.py # mteb/tasks/Image/ImageClassification/eng/DTDClassification.py # mteb/tasks/Image/ImageClassification/eng/EuroSATClassification.py # mteb/tasks/Image/ImageClassification/eng/FER2013Classification.py # mteb/tasks/Image/ImageClassification/eng/FGVCAircraftClassification.py # mteb/tasks/Image/ImageClassification/eng/Food101Classification.py # mteb/tasks/Image/ImageClassification/eng/GTSRBClassification.py # mteb/tasks/Image/ImageClassification/eng/Imagenet1k.py # mteb/tasks/Image/ImageClassification/eng/MNISTClassification.py # mteb/tasks/Image/ImageClassification/eng/OxfordFlowersClassification.py # mteb/tasks/Image/ImageClassification/eng/OxfordPetsClassification.py # mteb/tasks/Image/ImageClassification/eng/RESISC45Classification.py # mteb/tasks/Image/ImageClassification/eng/STL10Classification.py # mteb/tasks/Image/ImageClassification/eng/SUN397Classification.py # mteb/tasks/Image/ImageClassification/eng/StanfordCarsClassification.py # mteb/tasks/Image/ImageClassification/eng/UCF101Classification.py # mteb/tasks/Image/ImageClustering/eng/CIFAR.py # mteb/tasks/Image/ImageClustering/eng/ImageNet.py # mteb/tasks/Image/ImageClustering/eng/TinyImageNet.py # mteb/tasks/Image/ImageMultilabelClassification/eng/PascalVOC2007.py # mteb/tasks/Image/ImageTextPairClassification/AROCocoOrder.py # mteb/tasks/Image/ImageTextPairClassification/AROFlickrOrder.py # mteb/tasks/Image/ImageTextPairClassification/AROVisualAttribution.py # mteb/tasks/Image/ImageTextPairClassification/AROVisualRelation.py # mteb/tasks/Image/ImageTextPairClassification/ImageCoDe.py # mteb/tasks/Image/ImageTextPairClassification/SugarCrepe.py # mteb/tasks/Image/ImageTextPairClassification/Winoground.py # mteb/tasks/Image/VisualSTS/eng/STS12VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS13VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS14VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS15VisualSTS.py # mteb/tasks/Image/VisualSTS/eng/STS16VisualSTS.py # mteb/tasks/Image/VisualSTS/multilingual/STS17MultilingualVisualSTS.py # mteb/tasks/Image/VisualSTS/multilingual/STSBenchmarkMultilingualVisualSTS.py # mteb/tasks/Image/ZeroShotClassification/eng/Birdsnap.py # mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py # mteb/tasks/Image/ZeroShotClassification/eng/CLEVR.py # mteb/tasks/Image/ZeroShotClassification/eng/Caltech101.py # mteb/tasks/Image/ZeroShotClassification/eng/Country211.py # mteb/tasks/Image/ZeroShotClassification/eng/DTD.py # mteb/tasks/Image/ZeroShotClassification/eng/EuroSAT.py # mteb/tasks/Image/ZeroShotClassification/eng/FER2013.py # mteb/tasks/Image/ZeroShotClassification/eng/FGVCAircraft.py # mteb/tasks/Image/ZeroShotClassification/eng/Food101.py # mteb/tasks/Image/ZeroShotClassification/eng/GTSRB.py # mteb/tasks/Image/ZeroShotClassification/eng/Imagenet1k.py # mteb/tasks/Image/ZeroShotClassification/eng/MNIST.py # mteb/tasks/Image/ZeroShotClassification/eng/OxfordPets.py # mteb/tasks/Image/ZeroShotClassification/eng/PatchCamelyon.py # mteb/tasks/Image/ZeroShotClassification/eng/RESISC45.py # mteb/tasks/Image/ZeroShotClassification/eng/RenderedSST2.py # mteb/tasks/Image/ZeroShotClassification/eng/STL10.py # mteb/tasks/Image/ZeroShotClassification/eng/SUN397.py # mteb/tasks/Image/ZeroShotClassification/eng/SciMMIR.py # mteb/tasks/Image/ZeroShotClassification/eng/StanfordCars.py # mteb/tasks/Image/ZeroShotClassification/eng/UCF101.py # mteb/tasks/PairClassification/fas/FaMTEBPairClassification.py # mteb/tasks/PairClassification/multilingual/XNLI.py # mteb/tasks/Retrieval/ara/SadeemQuestionRetrieval.py # mteb/tasks/Retrieval/multilingual/PublicHealthQARetrieval.py # mteb/tasks/Retrieval/pol/FiQAPLRetrieval.py # mteb/tasks/Retrieval/zho/CMTEBRetrieval.py # pyproject.toml
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.Adding a model checklist
mteb.get_model(model_name, revision)
andmteb.get_model_meta(model_name, revision)