Skip to content

Add model specific dependencies in pyproject.toml #2424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ayush1298
Copy link
Collaborator

@ayush1298 ayush1298 commented Mar 24, 2025

This PR is related to addition of model specific dependencies in pyproject.toml as well as giving detailed and clear warning using requires_package.
Closes #2398 .

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes!

@ayush1298
Copy link
Collaborator Author

@Samoed @KennethEnevoldsen Should we also update these in docs, like when we are adding complete new model file instead of just modelmeta in already present file, and if we need to install dependencies then they should be added to pyptoject.toml and requires_package is used for checking them.

@Samoed
Copy link
Member

Samoed commented Mar 25, 2025

Yes, thats good idea!

@ayush1298
Copy link
Collaborator Author

Yes, thats good idea!

Where to add this exactly? In https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md?

@Samoed
Copy link
Member

Samoed commented Mar 25, 2025

Yes, here

@ayush1298
Copy link
Collaborator Author

@KennethEnevoldsen , can you review this one?


In the [voyage_models.py](../mteb/models/voyage_models.py) file, we have added the following code:
```python
requires_package(self, "voyageai", model_name, "pip install 'mteb[voyageai]'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add import

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is minor. Feel free to ignore or add in a second pr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, Will add next time in other PR. Thanks for reviewing.

@KennethEnevoldsen KennethEnevoldsen merged commit 8a024be into embeddings-benchmark:main Mar 26, 2025
10 checks passed
@ayush1298 ayush1298 deleted the Dependency_Specification branch March 26, 2025 13:34
isaac-chung added a commit that referenced this pull request Apr 1, 2025
* misc: Add image classification descriptive stats implementation (#2045)

* add ImageClassificationDescriptiveStatistics

* add MNIST descriptive stats

* use tuples instead

* add label count and update docstrings

* update MNIST example

* Update tasks table

* fix: Add column descriptions to leaderboard (#2039)

* fix: Add column descriptions to leaderboard

* removed existing overlap

* fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041)

* fix: Add BRIGHT Long

Fixes #1978

* fix: Add BRIGHT(long)

* fix bug in task results

* updated bright

* updated tests for TaskResults

* 1.34.12

Automatically generated by python-semantic-release

* misc: Add image clustering descriptive stats implementation (#2057)

* add image clustering descirptive stats and run
* finish off last one
* remove script

* fix: Update embed_dim for  jina models (#2058)

see embeddings-benchmark/results#117

* Update tasks table

* 1.34.13

Automatically generated by python-semantic-release

* Add giga embeddings (#1741)

* add gigaembeddings

* use jasper

* fix name

* create sentence_transformer instruct wrapper

* apply instruction template

* fix jasper

* update meta

* misc: Add ZS and multilabel image classification descriptive stats implementation (#2059)

* add image clustering descirptive stats and run

* finish off last one

* remove script

* add ImageMultilabelClassificationDescriptiveStatistics

* add VOC2007

* add zeroshot and mnist example

* Update tasks table

* Rename MIEB task classes with duplicated names (#2061)

fix class names

* misc: Add VisualSTS descriptive stats (#2062)

* add visualsts stats

* add last dataset

* Update tasks table

* fix: Added gte models (#1539)

* fix: Added gte models

* fix: Add mixbai models (#1540)

for #1515

* fix: Add climate fever v2 (#1873)

* Updated ClimateFEVER dataset with new version

* Adds Fill in the empty metadata.

* Updates the date tuple

* Update class name

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update domains

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task_subtypes

* Update annotations_creators for the first version

* Update date

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task subtypes

* Update path

* Update description

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Mina Parham <minaparham@Keatext.local>

* Update tasks table

* fix: Updating paper scripts (#1958)

* change reference revisions to align with paper

* Update author list

* Added code for main results table

* updated minor changes

* added external as a "no_revision_available" case

* revert unintended changes

* format

* 1.34.14

Automatically generated by python-semantic-release

* Add datasets for a benchmark newly introduced for "Engineering" domain (#1911)

* adding clustering tasks (built-bench-clustering S2S & P2P)

* updated built-bench-clustering tasks

* Updated BuiltBenchClustering tasks

* Added "Engineering" as new domain to TaskMetadata.py
* Updated tasks table in docs
* Updated task metadata for BuiltBenchClustering S2S and P2P

* updated metadata for clustering tasks

* Add/update BuiltBench tasks

- Add BuiltBenchRetrieval task
- Add BuiltBenchReranking task
- Update metadata for BuiltBenchClusterinP2P
- Update metadata for BuiltBenchClusterinS2S

* update BuiltBench benchmark

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Fix formatting via ruff

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* misc: update model names to adjust for adding to results repo (#2074)

* update model names to adjust for adding to results repo

* update model meta script

* misc: Add all image classification descriptive stats (#2073)

* add most image classification descr stats

* revert changes to encoder

* add stats

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Update tasks table

* ci: Rerun tests that fail due to networking issues. (#2029)

* fix: rerun tests that fail - Networking

* update tests to use tmp_path

* set versions for dev dependencies

* add pytest options to pyproject.toml

* add rerun json.decoder.JSONDecodeError

* remove JSONDecodeError from pyproject.toml

* add huggingface_hub.errors.HfHubHTTPError

* add huggingface_hub.errors.LocalEntryNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044

* FileNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029

* add doc to pytest rerun

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* fix: generate metadata (#2063)

* fix: generate metadata

* use logging not print for script

* lint

* add iso639 to dev pyproject

* fix import

* add memory_usage_mb

* set version for iso639

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.34.15

Automatically generated by python-semantic-release

* fix: add missing `e5` training datasets (#2065)

add missing training datasets

* 1.34.16

Automatically generated by python-semantic-release

* fix: Ensure voyage model uses different naming scheme (#2083)

* fix: Added make command for running leaderboard locally

* fix: Ensure voyage models doesn't re-use the name

* 1.34.17

Automatically generated by python-semantic-release

* fix: Freeze model/rank columns in leaderboard (#2044)

* fix: freeze model/rank columns in leaderboard

* freezing zero-shot column

* update min gradio version to 5.16.0 in pyproject.toml

---------

Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com>

* 1.34.18

Automatically generated by python-semantic-release

* fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086)

Fixes #2064

* 1.34.19

Automatically generated by python-semantic-release

* Remove duplicated string in docstring of TaskMetadata class (#2087)

* Remove duplicated string in docstring of TaskMetadata class

* Remove duplicated dataset field

* fix: Smarter leaderboard caching with cachetools (#2085)

* Added smarter caching to callbacks

* Added cachetools as a dependency

* Ran linting

* Removed debugging print statement

* Bumped Gradio version

* Dependency fixes

* Dependency fixes

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088)

* fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) )

Fixes #2064

* change MultilingualSentiment split from test to validation in CMTEB

* 1.34.20

Automatically generated by python-semantic-release

* merge gme models (#2089)

* fix: Add back task filtering by modalities (#2080)

* add back task filtering by modalities

* add unit test

* check if task modalities is a subset of model modalities and fix tests

* add model_modalities_more_than_task_modalities case

* 1.34.21

Automatically generated by python-semantic-release

* Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092)

* Added GTR Models to codebase

* Linted gtr models file.

* Added gtr-base/large/xl/xxl to sentence_transformers_models.py

* Added memory_usage_mb and training_datasets

* Reformatted training dataset names

* Reformatted training dataset names

* Reformatted training dataset names

---------

Co-authored-by: sufen <sufenf@gmail.com>

* misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095)

* add Any2TextMutipleChoiceDescriptiveStatistics

* run on all tasks

* Update tasks table

* fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101)

Reported with references to paper + qoutes.

* fix: Update links (#2098)

* Fix link

* Fix link

* 1.34.22

Automatically generated by python-semantic-release

* Add model inf-retriever-v1-1.5b (#2106)

Add inf-retriever-v1-1.5b model

* docs: Fix typos & refine text (#2102)

* Update app.py

* Fix typos

* misc: Run Zeroshot Classification Descriptive Stats (#2105)

* add most datasets

* add birdsnap and imgnet1k

* add scimmir and sun397

* add uck101 zs

* Update tasks table

* fix: add warning about task category conversion (#2108)

add warning about task category conversion

* 1.34.23

Automatically generated by python-semantic-release

* fix: Add codesage-large-v2 (#2090)

* Add codesage-large-v2

* Address comments

* Add training dataset

* Fix issues

* Format code

* Remove unnecessary wrapper

* 1.34.24

Automatically generated by python-semantic-release

* fix: add training data to BGE-m3-custom-fr (#2110)

This ensure that is it correctly filtered as non-zero-shot

* 1.34.25

Automatically generated by python-semantic-release

* fix: Upgrade ruff to be gradio compatible (#2111)

* fix: update ruff to be gradio compatible (>=0.9.3)

* format

* fix: upgrade ruff to latests (same as gradio compatible)

* 1.34.26

Automatically generated by python-semantic-release

* docs: Follow google docstring format (#2115)

Fixes #2113

* Update leaderboard_refresh.yaml (#2121)

* fix InstructSentenceTransformer Model name (#2125)

fix params

* fix voyage (#2127)

* fix: update e5 instruct training data (#2129)

update e5 training data

* 1.34.27

Automatically generated by python-semantic-release

* format

* Update tasks table

* fix: Add 2 new Static Sentence Transformer models (#2112)

* Add 2 new Static Sentence Transformer models

* Add Tatoeba

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.34.28

Automatically generated by python-semantic-release

* add is_cross_encoder (#1869)

* add is_cross_encoder

* Update mteb/model_meta.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* change value

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Qodo embed 1 1.5 b (#2137)

* feat: Add Qodo-Embed-1-1.5B model metadata

* fix: Add Qodo models to overview imports

* fix: Add adapted_from field to Qodo model metadata

* Update mteb/models/qodo_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* relint

---------

Co-authored-by: Tal Sheffer <tal.s@codium.ai>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* misc: merge summary retrieval into bitext mining (#2140)

merge summary retrieval into bitext mining

* test: fix dataset availability test (#2141)

This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call.

* fix: Update NVIDIA-Embed training data (#2143)

Added a few missing annotations for nvidia-embed

* 1.34.29

Automatically generated by python-semantic-release

* fix: Add annotations for Voyage exp (#2144)

* fix: Update NVIDIA-Embed training data

Added a few missing annotations for nvidia-embed

* fix update annotationf for voyage exp

* 1.34.30

Automatically generated by python-semantic-release

* Fix tokens num in cde models (#2148)

fix tokens

* feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146)

* feat: Add Qodo-Embed-1-7B model metadata and rename existing model

* lint

* fix revision

* update license name

---------

Co-authored-by: Tal Sheffer <tal.s@codium.ai>

* 1.35.0

Automatically generated by python-semantic-release

* misc: add Any2AnyRetrievalDescriptiveStatistics (#2139)

add Any2AnyRetrievalDescriptiveStatistics

* Update tasks table

* Added zero-shot percentages and different filtering scheme (#2153)

* Added zero-shot percentages and different filtering scheme

* Update mteb/model_meta.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Incorrect annotations for Mistral-based embedding models (#2157)

Fixes #2155

* 1.35.1

Automatically generated by python-semantic-release

* Update FaMTEBRetrieval.py (#2171)

The URL pointed to the settings page instead of the main repo URL. Now it is fixed.

* Update tasks table

* fix: Add Training data annotations (#2173)

* redo to voyage to only training data

* Add training data annotation for Kalm embeddings #2168

* Add correct training data annotations to Stella #2164

* removed fiqa PL as it does not exist

* remove ArxivClusteringS2S.v2 as it does not exist

* Add training data annotation for GIST embedding #2166

* fix max tokens for kalm models #2162

* remove eli 5

* 1.35.2

Automatically generated by python-semantic-release

* feat: Add MIEB and MIEB-lite as benchmarks (#2035)

* add mieb and mieb-lite to benchmarks

* add CompositionalityEvaluation and DocumentUnderstanding types

* add VisionCentric type

* add missing comma

* split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng

* use aggregate task instead so we can name the subsets

* shorten names

* fix import

* alternative strategy to avoid using get_task

* follow other aggregate tasks and skip metadata test

* run LB without errors when selecting MIEB(-lite)

* add back the capability as taks type

* typo

* extend description

* split into mieb(eng) and mieb(multilingual)

* remove unneeded files

* remove aggtask additions for test

* edit descriptions based on screenshots

* shorten

* rename to Compositionality and include ImageCoDeT2IMultiChoice

* re-tag missing VisionCentric tasks

* re-tag rparis and roxford as retrieval and include fixes

* re-tag voc2007 as image cls

* make lint

* correct num task types in descriptions

* add one model to models_to_annotate

* add mieb reference models

* update task types

* relabel to multilingual retrieval task type to align with paper

* fix reference and bibtex

* edit task list to match with final list

* add back agg task to reproduce table column in paper

* fix filtering and import

* update tests

* mieb lite add back missing tasks

* fix metadata test

* multi should have all 4 variants

* fix task counts

* lite has 10 task types

* fix visualSTS-17 lang splits

* Aggregate task can now use subsets & eval langs to filter TaskResults

* fix test and mark VisualSTS17 as multilingual

* fix tests

* add agg task running script

* add voyage meta

* fix citations

* capitalize

* add coarse/fine labels

---------

Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk>

* Update tasks table

* 1.36.0

Automatically generated by python-semantic-release

* fix: update training datasets and revision for jina models (#2179)

* feat: update training datasets and revision for jina models

* feat: update training datasets and revision for jina models

* fix: Add more training data annotations (#2178)

* redo to voyage to only training data

* Add training data annotation for Kalm embeddings #2168

* Add correct training data annotations to Stella #2164

* removed fiqa PL as it does not exist

* remove ArxivClusteringS2S.v2 as it does not exist

* Add training data annotation for GIST embedding #2166

* fix max tokens for kalm models #2162

* remove eli 5

* fix: add training data for Bilingual Embeddings

fixes #2167

* 1.36.1

Automatically generated by python-semantic-release

* Added training data annotation for e5-base-4k (#2186)

* fix: Added training data annotations to MXBAI (#2185)

* fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180)

This also resolves the missing data in the leaderboard.

Fixes #2172

* fix: Added training data annotation for MMLW models (#2188)

* Added training data annotation for MMLW models

* Added GIST annotations Kenneth missed

* Added Stella en 400m training data'

* 1.36.2

Automatically generated by python-semantic-release

* fix: Added training data for sentence-croissant (#2189)

* 1.36.3

Automatically generated by python-semantic-release

* fix: update ru models annotation (#2181)

* 1.36.4

Automatically generated by python-semantic-release

* fix: Alphabetical ordering of tasks in dropdowns (#2191)

* 1.36.5

Automatically generated by python-semantic-release

* misc: Speed up qrel creation in any2anyretrieval (#2196)

* use numpy vectorized operations instead of row-by-row

* scores are int

* use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199)

* add base models for e5 (#2183)

* add similar datasets (#2205)

* add similar datasets

* add nano

* update is filled

* Update mteb/abstasks/TaskMetadata.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* add labse annotation (#2182)

* add labse annotation

* Update mteb/models/sentence_transformers_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: Fixed leaderboard crash (#2221)

* Fixed leaderboard crash

* Fixed language selection error

* Ran linting

* 1.36.6

Automatically generated by python-semantic-release

* fix: More training data annotations (#2220)

* Added training  data annotation for bge-gemma

* Added missing annotations for Voyage models

* Added training data for sts-multilingual-mpnet

* Added all mteb datasets to STS-multilingual training data

* 1.36.7

Automatically generated by python-semantic-release

* Add LLM2CLIP (OpenAI variants) (#2222)

* model loading and get_text_embeddings

* add image_emb, fused_emb, and calc probs methods

* add b16 model

* add llm2clip_openai_l_14_224 (not working yet)

* got llm2clip_openai_l_14_224 working

* make lint

* add training sets and allow py files

* Change `dataset on HF` test to use official api (#2213)

* refactor dataset checking

* increase timeout

* increase timeout

* remove timeout

* Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197)

* Add Any2AnyMC descriptive stats

* Add descriptive stats function for ImageTextPC

* add descriptive stats examples

* linter

* update multi choice descriptive stats

* Update tasks table

* fix: Add training data annotations to uderver-bloom models (#2210)

* fix: Add training data annotations to uderver-bloom models

fixes #2193

* fix: add mixedbread

---------

Co-authored-by: Márton Kardos <power.up1163@gmail.com>

* 1.36.8

Automatically generated by python-semantic-release

* Add comment to `voyage-3-m-exp` model (#2229)

* remove model size from voyage-3-m-exp model

* Update mteb/models/voyage_models.py

* Update mteb/models/voyage_models.py

* docs: Update description of EURLex (#2231)

* Automatically add similar tasks to training_tasks (#2228)

* refactor dataset checking

* increase timeout

* increase timeout

* remove timeout

* start

* automatically find datasets

* update comment

* fix aggregate task metadata

* fixes

* lint

* rename

* update fetch check

* Remove overlapping legends from radar chart (#2195)

* Remove overlapping legends from radar chart

* ensure graph is not blocked

* Overlapping legend issue of Radar Chart

* misc: Run Any2AnyRetrieval descriptive stats (#2223)

* run a few datasets

* add a few more

* run more tasks

* add more datasets

* remove pdb

* remove newline

* add more datasets

* Update tasks table

* misc: Add rest of the vision centric and compositionality descriptive stats (#2267)

add the rest

* Update tasks table

* Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271)

* Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270)

* Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB

* Update memory_usage_mb with correct calculated value

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* remove comments

* added correct memory usage

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Apply linter fixes with ruff

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Add Arabic_Triplet_Matryoshka_V2 to overview.py

* Rename model file to ara_models.py and update imports

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Add WebFAQ Retrieval dataset (#2236)

* Add WebFAQ Retrieval dataset

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Small change WebFAQRetrieval.py

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Add remaining languages to WebFAQ Retrieval task

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Add descriptive stats

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Update tasks table

* 1.36.9

Automatically generated by python-semantic-release

* fix: Formatting issue in Performance Plot (#2237)

* Formatting issue in Performance Plot

* make lint

* added function for better code readability

* 1.36.10

Automatically generated by python-semantic-release

* ci: run test_dataset_on_hf separately (#2201)

* dont run test_dataset_on_hf in every pr

* lint

* Update call pytest test_datasets

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update tests/test_tasks/test_all_abstasks.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* not datasets for test

* run dataset loading test for push or pull_request

* apply feedback

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* add gemini-embedding-exp-03-07 (#2279)

* add gemini-embedding-exp-03-07

* remove space for lint

* lint fix

* update link (#2281)

* fix: Run remaining MIEB desc stats (#2288)

* run Vidore

* GLDv2

* run the rest

---------

Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan>

* Update tasks table

* 1.36.11

Automatically generated by python-semantic-release

* fix: Added Filter Modality (#2262)

* Added Filter Modality

* resolve suggestions

* make lint

* make sure test pass

* make lint

* added exclusive_modality_filter and unit tests

* Integrate tests on overview.py

* Update tests/test_overview.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* added task related to image modality

* Update mteb/abstasks/AbsTask.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update mteb/abstasks/AbsTask.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* update overview..py

* make lint

* update documentation

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.36.12

Automatically generated by python-semantic-release

* fix: Add `ModelMeta` license & custom validations (#2293)

* license validation

* move licenses

* update imports

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.13

Automatically generated by python-semantic-release

* ci: Add pre-commit hook (#2194)

* make dev life nicer - pre-commit hooks

* add pre-commit to install

* update precommit

* update ruff pre-commit

* lint

* lint

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* Update tasks table

* fix: bug in voyage implementation (#2304)

* fix: Fix bug in voyage implementation

"passage" is not a valid input for the voyage API. Remapped to "document".

* Update mteb/models/voyage_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.14

Automatically generated by python-semantic-release

* fix: Update voyage name to include Org. (#2322)

* 1.36.15

Automatically generated by python-semantic-release

* Added VDR Model (#2290)

* Added VDR Model

* change custom wrapper to SentenceTransformer Wrapper

* remove kwargs and add TODO for Image Modality

* Update mteb/models/vdr_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Resolve conflicting dependencies (#2323)

These errors where discovered when trying to install the package using `uv`.

We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies.

* 1.36.16

Automatically generated by python-semantic-release

* fix: remove SyntaxWarnings in py312 (#2325)

* fix: Resolve conflicting dependencies

These errors where discovered when trying to install the package using `uv`.

We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies.

* fix: Remove syntax warnings occuring in python 3.12

```
Python 3.12.0 (main, Oct  2 2023, 20:56:14) [Clang 16.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mteb # no syntax warnings
>>>
```

* 1.36.17

Automatically generated by python-semantic-release

* fix: add annotation models for stella zh (#2277)

* fix: add annotation models for stella zh

Additionally fixed a few annotation errors

* format

* Update mteb/models/stella_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.18

Automatically generated by python-semantic-release

* fix: Add ModelMeta rubert-mini-frida, BERTA (#2330)

* Add rubert-mini-frida model meta

* Add BERTA model meta

* docs: fix typos

* 1.36.19

Automatically generated by python-semantic-release

* fix: Add WebFAQ bitext mining tasks (#2326)

* Add WebFAQ bitext mining tasks

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Lower number of language pairs in WebFAQBitextMining

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Update tasks table

* 1.36.20

Automatically generated by python-semantic-release

* fix: Add `trust_remote_code` to MIRACLRetrieval

* fix: Add `trust_remote_code` to MIRACLRetrieval (#2344)

* 1.36.21

Automatically generated by python-semantic-release

* fix: Correctly pass trust remote code to Miracl

* fix: Ensure MIRACL pass trust_remote_code (#2346)

* fix: Add `trust_remote_code` to MIRACLRetrieval

* fix: Correctly pass trust remote code to Miracl

* fix

* 1.36.22

Automatically generated by python-semantic-release

* add-Data Korean Clustering dataset (KLUE-modified) (#2283)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Rename dunzhang and Jasper models to NovaResearch (#2373)

* Rename dunzhang and Jasper models to NovaResearch

* rename model in tests

* correct reference link

* correct MIEB dataset stats (#2374)

* correct stats

* update Any2AnyMultiChoice qrels stats compute logic

* final correction

* Update tasks table

* Correct -1 to No information in Zero shot (#2381)

* fix leaderboard (#2385)

* fix: Reduce logging and Warnings (#2349)

* Reduce logging and Warnings

* make lint

* format license to lowercase

* Address all comments

* Update mteb/leaderboard/app.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.23

Automatically generated by python-semantic-release

* fix: b1ade (#2386)

* fix: added b1ade_models.py (#2340)

* added b1ade_models.py

* changing based on requested

* Update mteb/models/b1ade_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: missing import and formatting

---------

Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com>

* 1.36.24

Automatically generated by python-semantic-release

* fix: pin gradio dependency to ensure leaderboards works (#2387)

* 1.36.25

Automatically generated by python-semantic-release

* fix: Ensure BrightRetrieval is valid to run (#2334)

* fix: Ensure BrightRetrieval is valid to run

Not sure this is the best way to fix this. Let me know if you can find a better fix.

fixes #2327

* fix: convert brightretrieval to two tasks

* fix collecting error

* Update tasks table

* 1.36.26

Automatically generated by python-semantic-release

* Pass task name to all evaluators (#2389)

* pass task name to all tasks

* add test

* fix loader

* fix: renaming Zeroshot -> ZeroShot (#2395)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

* 1.36.27

Automatically generated by python-semantic-release

* fix: Update AmazonPolarityClassification license (#2402)

Update AmazonPolarityClassification.py

* fix b1ade name (#2403)

* 1.36.28

Automatically generated by python-semantic-release

* Minor style changes (#2396)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers  (#2302)

* Clustrec covid new dataset and task

* fix

* fix

* fix

* fix

* fix

* descriptive stats

* change all mentions of clustrec-covidp2p to clustrec-covid

* change ' to "

* Update tasks table

* fix: Major updates to docs + make mieb dep optional (#2397)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* fix: Major updates to documentation

This PR does the following:
- This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later.
- added minor code updates due to discovered inconsistencies in docs and code.
- Added the MMTEB citation where applicable
- makes the docs ready to move torchvision to an optional dependency

* Moved VISTA example

* rename 1

* rename 2

* format

* fixed error

* fix: make torchvision optional (#2399)

* fix: make torchvision optional

* format

* add docs

* minor fix

* remove transform from Any2TextMultipleChoiceEvaluator

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* move Running SentenceTransformer model with prompts to usage

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.29

Automatically generated by python-semantic-release

* remove Arabic_Triplet_Matryoshka_V2.py (#2405)

* Min torchvision>0.2.1 (#2410)

matching torch>1.0.0

* fix: Add validation to model_name in `ModelMeta` (#2404)

* add test for name validation

* upd docs

* upd cohere name

* fix tests

* fix name for average_word_embeddings_komninos

* fix name for average_word_embeddings_komninos

* fix reranker test

* fix reranker test

* 1.36.30

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414)

* refactor CV-Bench

* reimplement CV Bench

* remove abstask/evaluator/tests for Any2TextMultipleChoice

* rerun descriptive stats

* Update tasks table

* fix: Add option to remove benchmark from leaderboard (#2417)

fix: Add option to remove leaderboard from leaderboard

fixes #2413

This only removed the benchmark from the leaderboard but keep it in MTEB.

* 1.36.31

Automatically generated by python-semantic-release

* fix: Add VDR Multilingual Dataset (#2408)

* Added VDR Multilingual Dataset

* address comments

* make lint

* Formated Dataset for retrieval

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* make lint

* corrected date

* fix dataset building

* move to image folder

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* 1.36.32

Automatically generated by python-semantic-release

* HOTFIX: pin setuptools (#2423)

* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs

* add __init__.py Clustering > kor folder,  And   edit __init__.py in Clustering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks table

* Update speed dependencies with new setuptools release (#2429)

* add richinfoai models (#2427)

* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter

* Added Memory Usage column on leaderboard (#2428)

* docs: typos; Standardize spacing; Chronological order (#2436)

* Fix typos; add chrono order

* Fix spacing

* fix: Add model specific dependencies in pyproject.toml (#2424)

* Add model specific dependencies in pyproject.toml

* Update documentation

* 1.36.33

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation

* Update tasks table

* Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445)

Fixes #2444

* Feat/searchmap preview (#2420)

* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Add Background Gradients in Summary and Task Table (#2392)

* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments

* add ops_moa_models (#2439)

* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* leaderboard fix (#2456)

* ci: cache `~/.cache/huggingface` (#2464)

ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file

* Update tasks table

* fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443)

* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443

* fix: add nb_sbert model (#2339)

* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.34

Automatically generated by python-semantic-release

* fix test

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com>
Co-authored-by: Mina Parham <minaparham@Keatext.local>
Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com>
Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com>
Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com>
Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru>
Co-authored-by: Márton Kardos <power.up1163@gmail.com>
Co-authored-by: sufen-f <sufenfong@gmail.com>
Co-authored-by: sufen <sufenf@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Samuel Yang <samuelyang150@gmail.com>
Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: talshef <tsheffer@gmail.com>
Co-authored-by: Tal Sheffer <tal.s@codium.ai>
Co-authored-by: garciasces <garciasces@madrid.es>
Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk>
Co-authored-by: Wang Bo <bo.wang@jina.ai>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com>
Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com>
Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com>
Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan>
Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com>
Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com>
Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com>
Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com>
Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com>
Co-authored-by: richinfo-ai <richinfoai@163.com>
Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com>
Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com>
Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com>
Co-authored-by: theatollersrud <thea.tollersrud@nb.no>
Samoed added a commit that referenced this pull request Apr 4, 2025
* [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414)

* refactor CV-Bench

* reimplement CV Bench

* remove abstask/evaluator/tests for Any2TextMultipleChoice

* rerun descriptive stats

* Update tasks table

* fix: Add option to remove benchmark from leaderboard (#2417)

fix: Add option to remove leaderboard from leaderboard

fixes #2413

This only removed the benchmark from the leaderboard but keep it in MTEB.

* 1.36.31

Automatically generated by python-semantic-release

* fix: Add VDR Multilingual Dataset (#2408)

* Added VDR Multilingual Dataset

* address comments

* make lint

* Formated Dataset for retrieval

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* make lint

* corrected date

* fix dataset building

* move to image folder

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* 1.36.32

Automatically generated by python-semantic-release

* HOTFIX: pin setuptools (#2423)

* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs

* add __init__.py Clustering > kor folder,  And   edit __init__.py in Clustering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks table

* Update speed dependencies with new setuptools release (#2429)

* add richinfoai models (#2427)

* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter

* Added Memory Usage column on leaderboard (#2428)

* docs: typos; Standardize spacing; Chronological order (#2436)

* Fix typos; add chrono order

* Fix spacing

* fix: Add model specific dependencies in pyproject.toml (#2424)

* Add model specific dependencies in pyproject.toml

* Update documentation

* 1.36.33

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation

* Update tasks table

* Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445)

Fixes #2444

* Feat/searchmap preview (#2420)

* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Add Background Gradients in Summary and Task Table (#2392)

* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments

* add ops_moa_models (#2439)

* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* leaderboard fix (#2456)

* ci: cache `~/.cache/huggingface` (#2464)

ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file

* Update tasks table

* fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443)

* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443

* fix: add nb_sbert model (#2339)

* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.34

Automatically generated by python-semantic-release

* suppress logging warnings on leaderboard (#2406)

* supress logging warnings

* remove loggers

* return blocks

* rename function

* fix gme models

* add server name

* update after merge

* fix ruff

* fix: E5 instruct now listed as sbert compatible (#2475)

Fixes #1442

* 1.36.35

Automatically generated by python-semantic-release

* [MIEB] rename VisionCentric to VisionCentricQA (#2479)

rename VisionCentric to VisionCentricQA

* ci: Run dataset loading only when pushing to main (#2480)

Update dataset_loading.yml

* fix table in tasks.md (#2483)

* Update tasks table

* fix imports

* update model loader

* remove unused imports

* fix clip name

* fix moco models

* fix tests

* fix tests

---------

Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com>
Co-authored-by: richinfo-ai <richinfoai@163.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com>
Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com>
Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Sam Heymann <40773225+sam-hey@users.noreply.github.com>
Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com>
Co-authored-by: theatollersrud <thea.tollersrud@nb.no>
isaac-chung added a commit that referenced this pull request Apr 4, 2025
* misc: Add image classification descriptive stats implementation (#2045)

* add ImageClassificationDescriptiveStatistics

* add MNIST descriptive stats

* use tuples instead

* add label count and update docstrings

* update MNIST example

* Update tasks table

* fix: Add column descriptions to leaderboard (#2039)

* fix: Add column descriptions to leaderboard

* removed existing overlap

* fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041)

* fix: Add BRIGHT Long

Fixes #1978

* fix: Add BRIGHT(long)

* fix bug in task results

* updated bright

* updated tests for TaskResults

* 1.34.12

Automatically generated by python-semantic-release

* misc: Add image clustering descriptive stats implementation (#2057)

* add image clustering descirptive stats and run
* finish off last one
* remove script

* fix: Update embed_dim for  jina models (#2058)

see embeddings-benchmark/results#117

* Update tasks table

* 1.34.13

Automatically generated by python-semantic-release

* Add giga embeddings (#1741)

* add gigaembeddings

* use jasper

* fix name

* create sentence_transformer instruct wrapper

* apply instruction template

* fix jasper

* update meta

* misc: Add ZS and multilabel image classification descriptive stats implementation (#2059)

* add image clustering descirptive stats and run

* finish off last one

* remove script

* add ImageMultilabelClassificationDescriptiveStatistics

* add VOC2007

* add zeroshot and mnist example

* Update tasks table

* Rename MIEB task classes with duplicated names (#2061)

fix class names

* misc: Add VisualSTS descriptive stats (#2062)

* add visualsts stats

* add last dataset

* Update tasks table

* fix: Added gte models (#1539)

* fix: Added gte models

* fix: Add mixbai models (#1540)

for #1515

* fix: Add climate fever v2 (#1873)

* Updated ClimateFEVER dataset with new version

* Adds Fill in the empty metadata.

* Updates the date tuple

* Update class name

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update domains

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task_subtypes

* Update annotations_creators for the first version

* Update date

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task subtypes

* Update path

* Update description

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Mina Parham <minaparham@Keatext.local>

* Update tasks table

* fix: Updating paper scripts (#1958)

* change reference revisions to align with paper

* Update author list

* Added code for main results table

* updated minor changes

* added external as a "no_revision_available" case

* revert unintended changes

* format

* 1.34.14

Automatically generated by python-semantic-release

* Add datasets for a benchmark newly introduced for "Engineering" domain (#1911)

* adding clustering tasks (built-bench-clustering S2S & P2P)

* updated built-bench-clustering tasks

* Updated BuiltBenchClustering tasks

* Added "Engineering" as new domain to TaskMetadata.py
* Updated tasks table in docs
* Updated task metadata for BuiltBenchClustering S2S and P2P

* updated metadata for clustering tasks

* Add/update BuiltBench tasks

- Add BuiltBenchRetrieval task
- Add BuiltBenchReranking task
- Update metadata for BuiltBenchClusterinP2P
- Update metadata for BuiltBenchClusterinS2S

* update BuiltBench benchmark

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Fix formatting via ruff

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* misc: update model names to adjust for adding to results repo (#2074)

* update model names to adjust for adding to results repo

* update model meta script

* misc: Add all image classification descriptive stats (#2073)

* add most image classification descr stats

* revert changes to encoder

* add stats

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Update tasks table

* ci: Rerun tests that fail due to networking issues. (#2029)

* fix: rerun tests that fail - Networking

* update tests to use tmp_path

* set versions for dev dependencies

* add pytest options to pyproject.toml

* add rerun json.decoder.JSONDecodeError

* remove JSONDecodeError from pyproject.toml

* add huggingface_hub.errors.HfHubHTTPError

* add huggingface_hub.errors.LocalEntryNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044

* FileNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029

* add doc to pytest rerun

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* fix: generate metadata (#2063)

* fix: generate metadata

* use logging not print for script

* lint

* add iso639 to dev pyproject

* fix import

* add memory_usage_mb

* set version for iso639

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.34.15

Automatically generated by python-semantic-release

* fix: add missing `e5` training datasets (#2065)

add missing training datasets

* 1.34.16

Automatically generated by python-semantic-release

* fix: Ensure voyage model uses different naming scheme (#2083)

* fix: Added make command for running leaderboard locally

* fix: Ensure voyage models doesn't re-use the name

* 1.34.17

Automatically generated by python-semantic-release

* fix: Freeze model/rank columns in leaderboard (#2044)

* fix: freeze model/rank columns in leaderboard

* freezing zero-shot column

* update min gradio version to 5.16.0 in pyproject.toml

---------

Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com>

* 1.34.18

Automatically generated by python-semantic-release

* fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086)

Fixes #2064

* 1.34.19

Automatically generated by python-semantic-release

* Remove duplicated string in docstring of TaskMetadata class (#2087)

* Remove duplicated string in docstring of TaskMetadata class

* Remove duplicated dataset field

* fix: Smarter leaderboard caching with cachetools (#2085)

* Added smarter caching to callbacks

* Added cachetools as a dependency

* Ran linting

* Removed debugging print statement

* Bumped Gradio version

* Dependency fixes

* Dependency fixes

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088)

* fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) )

Fixes #2064

* change MultilingualSentiment split from test to validation in CMTEB

* 1.34.20

Automatically generated by python-semantic-release

* merge gme models (#2089)

* fix: Add back task filtering by modalities (#2080)

* add back task filtering by modalities

* add unit test

* check if task modalities is a subset of model modalities and fix tests

* add model_modalities_more_than_task_modalities case

* 1.34.21

Automatically generated by python-semantic-release

* Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092)

* Added GTR Models to codebase

* Linted gtr models file.

* Added gtr-base/large/xl/xxl to sentence_transformers_models.py

* Added memory_usage_mb and training_datasets

* Reformatted training dataset names

* Reformatted training dataset names

* Reformatted training dataset names

---------

Co-authored-by: sufen <sufenf@gmail.com>

* misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095)

* add Any2TextMutipleChoiceDescriptiveStatistics

* run on all tasks

* Update tasks table

* fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101)

Reported with references to paper + qoutes.

* fix: Update links (#2098)

* Fix link

* Fix link

* 1.34.22

Automatically generated by python-semantic-release

* Add model inf-retriever-v1-1.5b (#2106)

Add inf-retriever-v1-1.5b model

* docs: Fix typos & refine text (#2102)

* Update app.py

* Fix typos

* misc: Run Zeroshot Classification Descriptive Stats (#2105)

* add most datasets

* add birdsnap and imgnet1k

* add scimmir and sun397

* add uck101 zs

* Update tasks table

* fix: add warning about task category conversion (#2108)

add warning about task category conversion

* 1.34.23

Automatically generated by python-semantic-release

* fix: Add codesage-large-v2 (#2090)

* Add codesage-large-v2

* Address comments

* Add training dataset

* Fix issues

* Format code

* Remove unnecessary wrapper

* 1.34.24

Automatically generated by python-semantic-release

* fix: add training data to BGE-m3-custom-fr (#2110)

This ensure that is it correctly filtered as non-zero-shot

* 1.34.25

Automatically generated by python-semantic-release

* fix: Upgrade ruff to be gradio compatible (#2111)

* fix: update ruff to be gradio compatible (>=0.9.3)

* format

* fix: upgrade ruff to latests (same as gradio compatible)

* 1.34.26

Automatically generated by python-semantic-release

* docs: Follow google docstring format (#2115)

Fixes #2113

* Update leaderboard_refresh.yaml (#2121)

* fix InstructSentenceTransformer Model name (#2125)

fix params

* fix voyage (#2127)

* fix: update e5 instruct training data (#2129)

update e5 training data

* 1.34.27

Automatically generated by python-semantic-release

* format

* Update tasks table

* fix: Add 2 new Static Sentence Transformer models (#2112)

* Add 2 new Static Sentence Transformer models

* Add Tatoeba

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.34.28

Automatically generated by python-semantic-release

* add is_cross_encoder (#1869)

* add is_cross_encoder

* Update mteb/model_meta.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* change value

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Qodo embed 1 1.5 b (#2137)

* feat: Add Qodo-Embed-1-1.5B model metadata

* fix: Add Qodo models to overview imports

* fix: Add adapted_from field to Qodo model metadata

* Update mteb/models/qodo_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* relint

---------

Co-authored-by: Tal Sheffer <tal.s@codium.ai>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* misc: merge summary retrieval into bitext mining (#2140)

merge summary retrieval into bitext mining

* test: fix dataset availability test (#2141)

This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call.

* fix: Update NVIDIA-Embed training data (#2143)

Added a few missing annotations for nvidia-embed

* 1.34.29

Automatically generated by python-semantic-release

* fix: Add annotations for Voyage exp (#2144)

* fix: Update NVIDIA-Embed training data

Added a few missing annotations for nvidia-embed

* fix update annotationf for voyage exp

* 1.34.30

Automatically generated by python-semantic-release

* Fix tokens num in cde models (#2148)

fix tokens

* feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146)

* feat: Add Qodo-Embed-1-7B model metadata and rename existing model

* lint

* fix revision

* update license name

---------

Co-authored-by: Tal Sheffer <tal.s@codium.ai>

* 1.35.0

Automatically generated by python-semantic-release

* misc: add Any2AnyRetrievalDescriptiveStatistics (#2139)

add Any2AnyRetrievalDescriptiveStatistics

* Update tasks table

* Added zero-shot percentages and different filtering scheme (#2153)

* Added zero-shot percentages and different filtering scheme

* Update mteb/model_meta.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Incorrect annotations for Mistral-based embedding models (#2157)

Fixes #2155

* 1.35.1

Automatically generated by python-semantic-release

* Update FaMTEBRetrieval.py (#2171)

The URL pointed to the settings page instead of the main repo URL. Now it is fixed.

* Update tasks table

* fix: Add Training data annotations (#2173)

* redo to voyage to only training data

* Add training data annotation for Kalm embeddings #2168

* Add correct training data annotations to Stella #2164

* removed fiqa PL as it does not exist

* remove ArxivClusteringS2S.v2 as it does not exist

* Add training data annotation for GIST embedding #2166

* fix max tokens for kalm models #2162

* remove eli 5

* 1.35.2

Automatically generated by python-semantic-release

* feat: Add MIEB and MIEB-lite as benchmarks (#2035)

* add mieb and mieb-lite to benchmarks

* add CompositionalityEvaluation and DocumentUnderstanding types

* add VisionCentric type

* add missing comma

* split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng

* use aggregate task instead so we can name the subsets

* shorten names

* fix import

* alternative strategy to avoid using get_task

* follow other aggregate tasks and skip metadata test

* run LB without errors when selecting MIEB(-lite)

* add back the capability as taks type

* typo

* extend description

* split into mieb(eng) and mieb(multilingual)

* remove unneeded files

* remove aggtask additions for test

* edit descriptions based on screenshots

* shorten

* rename to Compositionality and include ImageCoDeT2IMultiChoice

* re-tag missing VisionCentric tasks

* re-tag rparis and roxford as retrieval and include fixes

* re-tag voc2007 as image cls

* make lint

* correct num task types in descriptions

* add one model to models_to_annotate

* add mieb reference models

* update task types

* relabel to multilingual retrieval task type to align with paper

* fix reference and bibtex

* edit task list to match with final list

* add back agg task to reproduce table column in paper

* fix filtering and import

* update tests

* mieb lite add back missing tasks

* fix metadata test

* multi should have all 4 variants

* fix task counts

* lite has 10 task types

* fix visualSTS-17 lang splits

* Aggregate task can now use subsets & eval langs to filter TaskResults

* fix test and mark VisualSTS17 as multilingual

* fix tests

* add agg task running script

* add voyage meta

* fix citations

* capitalize

* add coarse/fine labels

---------

Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk>

* Update tasks table

* 1.36.0

Automatically generated by python-semantic-release

* fix: update training datasets and revision for jina models (#2179)

* feat: update training datasets and revision for jina models

* feat: update training datasets and revision for jina models

* fix: Add more training data annotations (#2178)

* redo to voyage to only training data

* Add training data annotation for Kalm embeddings #2168

* Add correct training data annotations to Stella #2164

* removed fiqa PL as it does not exist

* remove ArxivClusteringS2S.v2 as it does not exist

* Add training data annotation for GIST embedding #2166

* fix max tokens for kalm models #2162

* remove eli 5

* fix: add training data for Bilingual Embeddings

fixes #2167

* 1.36.1

Automatically generated by python-semantic-release

* Added training data annotation for e5-base-4k (#2186)

* fix: Added training data annotations to MXBAI (#2185)

* fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180)

This also resolves the missing data in the leaderboard.

Fixes #2172

* fix: Added training data annotation for MMLW models (#2188)

* Added training data annotation for MMLW models

* Added GIST annotations Kenneth missed

* Added Stella en 400m training data'

* 1.36.2

Automatically generated by python-semantic-release

* fix: Added training data for sentence-croissant (#2189)

* 1.36.3

Automatically generated by python-semantic-release

* fix: update ru models annotation (#2181)

* 1.36.4

Automatically generated by python-semantic-release

* fix: Alphabetical ordering of tasks in dropdowns (#2191)

* 1.36.5

Automatically generated by python-semantic-release

* misc: Speed up qrel creation in any2anyretrieval (#2196)

* use numpy vectorized operations instead of row-by-row

* scores are int

* use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199)

* add base models for e5 (#2183)

* add similar datasets (#2205)

* add similar datasets

* add nano

* update is filled

* Update mteb/abstasks/TaskMetadata.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* add labse annotation (#2182)

* add labse annotation

* Update mteb/models/sentence_transformers_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: Fixed leaderboard crash (#2221)

* Fixed leaderboard crash

* Fixed language selection error

* Ran linting

* 1.36.6

Automatically generated by python-semantic-release

* fix: More training data annotations (#2220)

* Added training  data annotation for bge-gemma

* Added missing annotations for Voyage models

* Added training data for sts-multilingual-mpnet

* Added all mteb datasets to STS-multilingual training data

* 1.36.7

Automatically generated by python-semantic-release

* Add LLM2CLIP (OpenAI variants) (#2222)

* model loading and get_text_embeddings

* add image_emb, fused_emb, and calc probs methods

* add b16 model

* add llm2clip_openai_l_14_224 (not working yet)

* got llm2clip_openai_l_14_224 working

* make lint

* add training sets and allow py files

* Change `dataset on HF` test to use official api (#2213)

* refactor dataset checking

* increase timeout

* increase timeout

* remove timeout

* Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197)

* Add Any2AnyMC descriptive stats

* Add descriptive stats function for ImageTextPC

* add descriptive stats examples

* linter

* update multi choice descriptive stats

* Update tasks table

* fix: Add training data annotations to uderver-bloom models (#2210)

* fix: Add training data annotations to uderver-bloom models

fixes #2193

* fix: add mixedbread

---------

Co-authored-by: Márton Kardos <power.up1163@gmail.com>

* 1.36.8

Automatically generated by python-semantic-release

* Add comment to `voyage-3-m-exp` model (#2229)

* remove model size from voyage-3-m-exp model

* Update mteb/models/voyage_models.py

* Update mteb/models/voyage_models.py

* docs: Update description of EURLex (#2231)

* Automatically add similar tasks to training_tasks (#2228)

* refactor dataset checking

* increase timeout

* increase timeout

* remove timeout

* start

* automatically find datasets

* update comment

* fix aggregate task metadata

* fixes

* lint

* rename

* update fetch check

* Remove overlapping legends from radar chart (#2195)

* Remove overlapping legends from radar chart

* ensure graph is not blocked

* Overlapping legend issue of Radar Chart

* misc: Run Any2AnyRetrieval descriptive stats (#2223)

* run a few datasets

* add a few more

* run more tasks

* add more datasets

* remove pdb

* remove newline

* add more datasets

* Update tasks table

* misc: Add rest of the vision centric and compositionality descriptive stats (#2267)

add the rest

* Update tasks table

* Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271)

* Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270)

* Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB

* Update memory_usage_mb with correct calculated value

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* remove comments

* added correct memory usage

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Apply linter fixes with ruff

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/Arabic_Triplet_Matryoshka_V2.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Add Arabic_Triplet_Matryoshka_V2 to overview.py

* Rename model file to ara_models.py and update imports

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Add WebFAQ Retrieval dataset (#2236)

* Add WebFAQ Retrieval dataset

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Small change WebFAQRetrieval.py

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Add remaining languages to WebFAQ Retrieval task

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Add descriptive stats

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Update tasks table

* 1.36.9

Automatically generated by python-semantic-release

* fix: Formatting issue in Performance Plot (#2237)

* Formatting issue in Performance Plot

* make lint

* added function for better code readability

* 1.36.10

Automatically generated by python-semantic-release

* ci: run test_dataset_on_hf separately (#2201)

* dont run test_dataset_on_hf in every pr

* lint

* Update call pytest test_datasets

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update tests/test_tasks/test_all_abstasks.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* not datasets for test

* run dataset loading test for push or pull_request

* apply feedback

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* add gemini-embedding-exp-03-07 (#2279)

* add gemini-embedding-exp-03-07

* remove space for lint

* lint fix

* update link (#2281)

* fix: Run remaining MIEB desc stats (#2288)

* run Vidore

* GLDv2

* run the rest

---------

Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan>

* Update tasks table

* 1.36.11

Automatically generated by python-semantic-release

* fix: Added Filter Modality (#2262)

* Added Filter Modality

* resolve suggestions

* make lint

* make sure test pass

* make lint

* added exclusive_modality_filter and unit tests

* Integrate tests on overview.py

* Update tests/test_overview.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* added task related to image modality

* Update mteb/abstasks/AbsTask.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update mteb/abstasks/AbsTask.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* update overview..py

* make lint

* update documentation

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.36.12

Automatically generated by python-semantic-release

* fix: Add `ModelMeta` license & custom validations (#2293)

* license validation

* move licenses

* update imports

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.13

Automatically generated by python-semantic-release

* ci: Add pre-commit hook (#2194)

* make dev life nicer - pre-commit hooks

* add pre-commit to install

* update precommit

* update ruff pre-commit

* lint

* lint

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* Update tasks table

* fix: bug in voyage implementation (#2304)

* fix: Fix bug in voyage implementation

"passage" is not a valid input for the voyage API. Remapped to "document".

* Update mteb/models/voyage_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.14

Automatically generated by python-semantic-release

* fix: Update voyage name to include Org. (#2322)

* 1.36.15

Automatically generated by python-semantic-release

* Added VDR Model (#2290)

* Added VDR Model

* change custom wrapper to SentenceTransformer Wrapper

* remove kwargs and add TODO for Image Modality

* Update mteb/models/vdr_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Resolve conflicting dependencies (#2323)

These errors where discovered when trying to install the package using `uv`.

We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies.

* 1.36.16

Automatically generated by python-semantic-release

* fix: remove SyntaxWarnings in py312 (#2325)

* fix: Resolve conflicting dependencies

These errors where discovered when trying to install the package using `uv`.

We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies.

* fix: Remove syntax warnings occuring in python 3.12

```
Python 3.12.0 (main, Oct  2 2023, 20:56:14) [Clang 16.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mteb # no syntax warnings
>>>
```

* 1.36.17

Automatically generated by python-semantic-release

* fix: add annotation models for stella zh (#2277)

* fix: add annotation models for stella zh

Additionally fixed a few annotation errors

* format

* Update mteb/models/stella_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.18

Automatically generated by python-semantic-release

* fix: Add ModelMeta rubert-mini-frida, BERTA (#2330)

* Add rubert-mini-frida model meta

* Add BERTA model meta

* docs: fix typos

* 1.36.19

Automatically generated by python-semantic-release

* fix: Add WebFAQ bitext mining tasks (#2326)

* Add WebFAQ bitext mining tasks

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Lower number of language pairs in WebFAQBitextMining

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Update tasks table

* 1.36.20

Automatically generated by python-semantic-release

* fix: Add `trust_remote_code` to MIRACLRetrieval

* fix: Add `trust_remote_code` to MIRACLRetrieval (#2344)

* 1.36.21

Automatically generated by python-semantic-release

* fix: Correctly pass trust remote code to Miracl

* fix: Ensure MIRACL pass trust_remote_code (#2346)

* fix: Add `trust_remote_code` to MIRACLRetrieval

* fix: Correctly pass trust remote code to Miracl

* fix

* 1.36.22

Automatically generated by python-semantic-release

* add-Data Korean Clustering dataset (KLUE-modified) (#2283)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Rename dunzhang and Jasper models to NovaResearch (#2373)

* Rename dunzhang and Jasper models to NovaResearch

* rename model in tests

* correct reference link

* correct MIEB dataset stats (#2374)

* correct stats

* update Any2AnyMultiChoice qrels stats compute logic

* final correction

* Update tasks table

* Correct -1 to No information in Zero shot (#2381)

* fix leaderboard (#2385)

* fix: Reduce logging and Warnings (#2349)

* Reduce logging and Warnings

* make lint

* format license to lowercase

* Address all comments

* Update mteb/leaderboard/app.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.23

Automatically generated by python-semantic-release

* fix: b1ade (#2386)

* fix: added b1ade_models.py (#2340)

* added b1ade_models.py

* changing based on requested

* Update mteb/models/b1ade_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: missing import and formatting

---------

Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com>

* 1.36.24

Automatically generated by python-semantic-release

* fix: pin gradio dependency to ensure leaderboards works (#2387)

* 1.36.25

Automatically generated by python-semantic-release

* fix: Ensure BrightRetrieval is valid to run (#2334)

* fix: Ensure BrightRetrieval is valid to run

Not sure this is the best way to fix this. Let me know if you can find a better fix.

fixes #2327

* fix: convert brightretrieval to two tasks

* fix collecting error

* Update tasks table

* 1.36.26

Automatically generated by python-semantic-release

* Pass task name to all evaluators (#2389)

* pass task name to all tasks

* add test

* fix loader

* fix: renaming Zeroshot -> ZeroShot (#2395)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

* 1.36.27

Automatically generated by python-semantic-release

* fix: Update AmazonPolarityClassification license (#2402)

Update AmazonPolarityClassification.py

* fix b1ade name (#2403)

* 1.36.28

Automatically generated by python-semantic-release

* Minor style changes (#2396)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers  (#2302)

* Clustrec covid new dataset and task

* fix

* fix

* fix

* fix

* fix

* descriptive stats

* change all mentions of clustrec-covidp2p to clustrec-covid

* change ' to "

* Update tasks table

* fix: Major updates to docs + make mieb dep optional (#2397)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* fix: Major updates to documentation

This PR does the following:
- This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later.
- added minor code updates due to discovered inconsistencies in docs and code.
- Added the MMTEB citation where applicable
- makes the docs ready to move torchvision to an optional dependency

* Moved VISTA example

* rename 1

* rename 2

* format

* fixed error

* fix: make torchvision optional (#2399)

* fix: make torchvision optional

* format

* add docs

* minor fix

* remove transform from Any2TextMultipleChoiceEvaluator

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* move Running SentenceTransformer model with prompts to usage

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.29

Automatically generated by python-semantic-release

* remove Arabic_Triplet_Matryoshka_V2.py (#2405)

* Min torchvision>0.2.1 (#2410)

matching torch>1.0.0

* fix: Add validation to model_name in `ModelMeta` (#2404)

* add test for name validation

* upd docs

* upd cohere name

* fix tests

* fix name for average_word_embeddings_komninos

* fix name for average_word_embeddings_komninos

* fix reranker test

* fix reranker test

* 1.36.30

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414)

* refactor CV-Bench

* reimplement CV Bench

* remove abstask/evaluator/tests for Any2TextMultipleChoice

* rerun descriptive stats

* Update tasks table

* fix: Add option to remove benchmark from leaderboard (#2417)

fix: Add option to remove leaderboard from leaderboard

fixes #2413

This only removed the benchmark from the leaderboard but keep it in MTEB.

* 1.36.31

Automatically generated by python-semantic-release

* fix: Add VDR Multilingual Dataset (#2408)

* Added VDR Multilingual Dataset

* address comments

* make lint

* Formated Dataset for retrieval

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* make lint

* corrected date

* fix dataset building

* move to image folder

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* 1.36.32

Automatically generated by python-semantic-release

* HOTFIX: pin setuptools (#2423)

* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs

* add __init__.py Clustering > kor folder,  And   edit __init__.py in Clustering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks table

* Update speed dependencies with new setuptools release (#2429)

* add richinfoai models (#2427)

* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter

* Added Memory Usage column on leaderboard (#2428)

* docs: typos; Standardize spacing; Chronological order (#2436)

* Fix typos; add chrono order

* Fix spacing

* fix: Add model specific dependencies in pyproject.toml (#2424)

* Add model specific dependencies in pyproject.toml

* Update documentation

* 1.36.33

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation

* Update tasks table

* Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445)

Fixes #2444

* Feat/searchmap preview (#2420)

* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Add Background Gradients in Summary and Task Table (#2392)

* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments

* add ops_moa_models (#2439)

* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* leaderboard fix (#2456)

* ci: cache `~/.cache/huggingface` (#2464)

ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file

* Update tasks table

* fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443)

* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443

* fix: add nb_sbert model (#2339)

* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.34

Automatically generated by python-semantic-release

* suppress logging warnings on leaderboard (#2406)

* supress logging warnings

* remove loggers

* return blocks

* rename function

* fix gme models

* add server name

* update after merge

* fix ruff

* fix: E5 instruct now listed as sbert compatible (#2475)

Fixes #1442

* 1.36.35

Automatically generated by python-semantic-release

* [MIEB] rename VisionCentric to VisionCentricQA (#2479)

rename VisionCentric to VisionCentricQA

* ci: Run dataset loading only when pushing to main (#2480)

Update dataset_loading.yml

* fix table in tasks.md (#2483)

* Update tasks table

* fix: add prompt to NanoDBPedia (#2486)

* 1.36.36

Automatically generated by python-semantic-release

* Fix Task Lang Table (#2487)

* Fix Task Lang Table

* added tasks.md

* fix

* fix: Ignore datasets not available in tests (#2484)

* add back MockAudioEncoder

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com>
Co-authored-by: Mina Parham <minaparham@Keatext.local>
Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com>
Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com>
Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com>
Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru>
Co-authored-by: Márton Kardos <power.up1163@gmail.com>
Co-authored-by: sufen-f <sufenfong@gmail.com>
Co-authored-by: sufen <sufenf@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Samuel Yang <samuelyang150@gmail.com>
Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: talshef <tsheffer@gmail.com>
Co-authored-by: Tal Sheffer <tal.s@codium.ai>
Co-authored-by: garciasces <garciasces@madrid.es>
Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk>
Co-authored-by: Wang Bo <bo.wang@jina.ai>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com>
Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com>
Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com>
Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan>
Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com>
Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com>
Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com>
Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com>
Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com>
Co-authored-by: richinfo-ai <richinfoai@163.com>
Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com>
Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com>
Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com>
Co-authored-by: theatollersrud <thea.tollersrud@nb.no>
Co-authored-by: hongst <76415500+seongtaehong@users.noreply.github.com>
isaac-chung added a commit that referenced this pull request May 3, 2025
* Update tasks table

* 1.36.26

Automatically generated by python-semantic-release

* Pass task name to all evaluators (#2389)

* pass task name to all tasks

* add test

* fix loader

* fix: renaming Zeroshot -> ZeroShot (#2395)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

* 1.36.27

Automatically generated by python-semantic-release

* fix: Update AmazonPolarityClassification license (#2402)

Update AmazonPolarityClassification.py

* fix b1ade name (#2403)

* 1.36.28

Automatically generated by python-semantic-release

* Minor style changes (#2396)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers  (#2302)

* Clustrec covid new dataset and task

* fix

* fix

* fix

* fix

* fix

* descriptive stats

* change all mentions of clustrec-covidp2p to clustrec-covid

* change ' to "

* Update tasks table

* fix: Major updates to docs + make mieb dep optional (#2397)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* fix: Major updates to documentation

This PR does the following:
- This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later.
- added minor code updates due to discovered inconsistencies in docs and code.
- Added the MMTEB citation where applicable
- makes the docs ready to move torchvision to an optional dependency

* Moved VISTA example

* rename 1

* rename 2

* format

* fixed error

* fix: make torchvision optional (#2399)

* fix: make torchvision optional

* format

* add docs

* minor fix

* remove transform from Any2TextMultipleChoiceEvaluator

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* move Running SentenceTransformer model with prompts to usage

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.29

Automatically generated by python-semantic-release

* remove Arabic_Triplet_Matryoshka_V2.py (#2405)

* Min torchvision>0.2.1 (#2410)

matching torch>1.0.0

* fix: Add validation to model_name in `ModelMeta` (#2404)

* add test for name validation

* upd docs

* upd cohere name

* fix tests

* fix name for average_word_embeddings_komninos

* fix name for average_word_embeddings_komninos

* fix reranker test

* fix reranker test

* 1.36.30

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414)

* refactor CV-Bench

* reimplement CV Bench

* remove abstask/evaluator/tests for Any2TextMultipleChoice

* rerun descriptive stats

* Update tasks table

* fix: Add option to remove benchmark from leaderboard (#2417)

fix: Add option to remove leaderboard from leaderboard

fixes #2413

This only removed the benchmark from the leaderboard but keep it in MTEB.

* 1.36.31

Automatically generated by python-semantic-release

* fix: Add VDR Multilingual Dataset (#2408)

* Added VDR Multilingual Dataset

* address comments

* make lint

* Formated Dataset for retrieval

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* make lint

* corrected date

* fix dataset building

* move to image folder

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* 1.36.32

Automatically generated by python-semantic-release

* HOTFIX: pin setuptools (#2423)

* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs

* add __init__.py Clustering > kor folder,  And   edit __init__.py in Clustering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks table

* Update speed dependencies with new setuptools release (#2429)

* add richinfoai models (#2427)

* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter

* Added Memory Usage column on leaderboard (#2428)

* docs: typos; Standardize spacing; Chronological order (#2436)

* Fix typos; add chrono order

* Fix spacing

* fix: Add model specific dependencies in pyproject.toml (#2424)

* Add model specific dependencies in pyproject.toml

* Update documentation

* 1.36.33

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation

* Update tasks table

* Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445)

Fixes #2444

* Feat/searchmap preview (#2420)

* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Add Background Gradients in Summary and Task Table (#2392)

* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments

* add ops_moa_models (#2439)

* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* leaderboard fix (#2456)

* ci: cache `~/.cache/huggingface` (#2464)

ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file

* Update tasks table

* fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443)

* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443

* fix: add nb_sbert model (#2339)

* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.34

Automatically generated by python-semantic-release

* suppress logging warnings on leaderboard (#2406)

* supress logging warnings

* remove loggers

* return blocks

* rename function

* fix gme models

* add server name

* update after merge

* fix ruff

* fix: E5 instruct now listed as sbert compatible (#2475)

Fixes #1442

* 1.36.35

Automatically generated by python-semantic-release

* [MIEB] rename VisionCentric to VisionCentricQA (#2479)

rename VisionCentric to VisionCentricQA

* ci: Run dataset loading only when pushing to main (#2480)

Update dataset_loading.yml

* fix table in tasks.md (#2483)

* Update tasks table

* fix: add prompt to NanoDBPedia (#2486)

* 1.36.36

Automatically generated by python-semantic-release

* Fix Task Lang Table (#2487)

* Fix Task Lang Table

* added tasks.md

* fix

* fix: Ignore datasets not available in tests (#2484)

* 1.36.37

Automatically generated by python-semantic-release

* [MIEB] align main metrics with leaderboard (#2489)

align main metrics with leaderboard

* typo in model name (#2491)

* SpeedTask add deprecated warning (#2493)

* Docs: Update README.md (#2494)

Update README.md

* fix transformers version for now (#2504)

* Fix typos (#2509)

* ci: refactor TaskMetadata eval langs test (#2501)

* refactor eval langs test

* function returns None

* add hard negaties tasks in _HISTORIC_DATASETS

* rename to ImageClustering folder (#2516)

rename folder

* Clean up trailing spaces citation (#2518)

* rename folder

* trailing spaces

* missed one

* [mieb] Memotion preprocessing code made more robust and readable (#2519)

* fix: validate lang code in ModelMeta (#2499)

* Update pyproject.toml (#2522)

* 1.36.38

Automatically generated by python-semantic-release

* Fix leaderboard version (#2524)

* fix gradio leaderboard run

* update docs

* Fix gte-multilingual-base embed_dim (#2526)

* [MIEB] Specify only the multilingual AggTask for MIEB-lite (#2539)

specify only the multilingual AggTask

* [mieb] fix hatefulmemes (#2531)

* fix hatefulmeme

* add to description and use polars instead

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Model conan (#2534)

* conan_models

* conan_models

* refactor code

* refactor code

---------

Co-authored-by: shyuli <shyuli@tencent.com>

* fix: Update mteb.get_tasks with an exclude_aggregate parameter to exclude aggregate tasks (#2536)

* Implement task.is_aggregate check

* Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed

* Update mteb.run with the new `task.is_aggregate` parameter

* Add tests

* Ran linter

* Changed logic to `exclude_aggregate`

* Updated from review comments

* Exclude aggregate by default false in get_tasks

* 1.36.39

Automatically generated by python-semantic-release

* docs: Add MIEB citation in benchmarks (#2544)

Add MIEB citation in benchmarks

* Add 2 new Vietnamese Retrieval Datasets (#2393)

* [ADD] 2 new Datasets

* [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO

* [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO

* Update tasks table

* fix: CacheWrapper per task (#2467)

* feat: CacheWrapper per task

* refactor logic

* update documentation

---------

Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>

* 1.36.40

Automatically generated by python-semantic-release

* misc: move MMTEB scripts and notebooks to separate repo (#2546)

move mmteb scripts and notebooks to separate repo

* fix: Update requirements in JinaWrapper (#2548)

fix: Update package requirements in JinaWrapper for einops and flash_attn

* 1.36.41

Automatically generated by python-semantic-release

* Docs: Add MIEB to README (#2550)

Add MIEB to README

* Add xlm_roberta_ua_distilled (#2547)

* defined model metadata for xlm_roberta_ua_distilled

* Update mteb/models/ua_sentence_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* included ua_sentence_models.py in overview.py

* applied linting, added missing fields in ModelMeta

* applied linting

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix me5 trainind data config to include xquad dataset (#2552)

* fix: me5 trainind data config to include xquad dataset

* Update mteb/models/e5_models.py

upddate: xquad key name

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: ME5_TRAINING_DATA format

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* feat: Added dataframe utilities to BenchmarkResults (#2542)

* fix: Added dataframe utilities to BenchmarkResults

- Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT?
- Added a tests for ModelResults and BenchmarksResults
- Added a few utility functions where needed
- Added docstring throughout ModelResults and BenchmarksResults
- Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then.

Prerequisite for #2454:

@ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right.

* refactor to to_dataframe and combine common dependencies

* ibid

* fix revision joining after discussion with @x-tabdeveloping

* remove strict=True for zip() as it is a >3.9 feature

* updated mock cache

* 1.37.0

Automatically generated by python-semantic-release

* fix e5_R_mistral_7b (#2490)

* fix e5_R_mistral_7b

* change wrapper

* address comments

* Added kwargs for pad_token

* correct lang format

* address comments

* add revision

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix unintentional working of filters on leaderboard (#2535)

* fix unintentional working of filters on leaderboard

* address comments

* make lint

* address comments

* rollback unnecessary changes

* feat: UI Overhaul (#2549)

* Bumped gradio version to latest

* Added new Gradio table functionality to leaderboard

* Removed search bar

* Changed color scheme in plot to match the table

* Added new benchmark selector in sidebar

* Changed not activated button type to secondary

* Short-circuited callbacks that are based on language selection

* Re-added column width calculation since it got messed up

* Commented out gradient for per-task table as it slowed things down substantially

* Styling and layout updates

* Adjusted comments according to reviews

* Converted all print statements to logger.debug

* Removed pydantic version fix

* Ran linting

* Remove commented out code

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Moved English,v1 to Legacy section

* Closed the benchmark sharing accordion by default

* Adjusted markdown blocks according to suggestions

* Ran linter

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.0

Automatically generated by python-semantic-release

* add USER2 (#2560)

* add user2

* add training code

* update prompts

* Fix leaderboard entry for BuiltBench (#2563)

Fix leaderboard entry for BuiltBench (#2562)

Co-authored-by: Mehrzad Shahin-Moghadam <mehr@Mehrzads-MacBook-Pro.local>

* fix: jasper models embeddings having nan values (#2481)

* 1.38.1

Automatically generated by python-semantic-release

* fix frida datasets (#2565)

* Add relle (#2564)

* Add relle
* defined model metadata for relle

* Add mteb/models/relle_models.py

* Update mteb/models/relle_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* lint after commit

run after "make lint"

* Add into model_modules

Add model into model_modules and lint check

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Backfill task metadata for metadata for GermanDPR and GermanQuAD (#2566)

* Add metadata for GermanDPR and GermanQuAD

* PR improvements

* Update tasks table

* Add  ModelMeta for CodeSearch-ModernBERT-Crow-Plus (#2570)

* Add files via upload

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update overview.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update mteb/models/shuu_model.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Docs: Improve MIEB docs (#2569)

* Add missing annotations (#2498)

* Update tasks table

* move icon & name to benchmark dataclass (#2573)

* Remove the comments from ImageEncoder (#2579)

* fix: Add Encodechka benchmark (#2561)

* add tasks

* add benchmark

* fix imports

* update stsb split

* Update tasks table

* 1.38.2

Automatically generated by python-semantic-release

* fix FlagEmbedding package name (#2588)

* fix codecarbon version (#2587)

* Add MIEB image only benchmark (#2590)

* add vision only bench

* add description

* correct zs task modalities

* specify tasks param

* Add image only MIEB benchmark to LB left panel (#2596)

* Update benchmarks.py

* make lint

* add to left side bar

* update Doubao-1.5-Embedding (#2575)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Add WebSSL models (#2604)

* add 2 web SSL dino models

* add models from collection and revisions

* update memory_usage_mb and embed dim

* use automodel instead

* fix mieb citation (#2606)

* 1.38.3

Automatically generated by python-semantic-release

* Update Doubao-1.5-Embedding (#2611)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* CI: update benchmark table (#2609)

* update benchmark table

* fix table

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update Doubao-1.5-Embedding revision (#2613)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* CI: fix table  (#2615)

* Update tasks & benchmarks tables

* Update gradio version (#2558)

* Update gradio version

Closes #2557

* bump gradio

* fix: Removed missing dataset for MTEB(Multilingual) and bumped version

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* CI: fix infinitely committing issue (#2616)

* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ScandiSent dataset (#2620)

* add scandisent dataset

* add to init

* typo

* lint

* 1.38.4

Automatically generated by python-semantic-release

* Format all citations (#2614)

* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script

* fix citations (#2628)

* Add Talemaader pair classification task (#2621)

Add talemaader pair classification task

* fix citations

* fix citations

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com>
Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com>
Co-authored-by: richinfo-ai <richinfoai@163.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com>
Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Sam Heymann <40773225+sam-hey@users.noreply.github.com>
Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com>
Co-authored-by: theatollersrud <thea.tollersrud@nb.no>
Co-authored-by: hongst <76415500+seongtaehong@users.noreply.github.com>
Co-authored-by: E. Tolga Ayan <33233561+tolgayan@users.noreply.github.com>
Co-authored-by: lllsy12138 <50816213+lllsy12138@users.noreply.github.com>
Co-authored-by: shyuli <shyuli@tencent.com>
Co-authored-by: Siddharth M. Bhatia <siddharth@sidmb.com>
Co-authored-by: Bao Loc Pham <67360122+BaoLocPham@users.noreply.github.com>
Co-authored-by: Flo <FlorianRottach@aol.com>
Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Olesksii Horchynskyi <121444758+panalexeu@users.noreply.github.com>
Co-authored-by: Pandaswag <110003154+torchtorchkimtorch@users.noreply.github.com>
Co-authored-by: Márton Kardos <power.up1163@gmail.com>
Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com>
Co-authored-by: Mehrzad Shahin-Moghadam <mehr@Mehrzads-MacBook-Pro.local>
Co-authored-by: Youngjoon Jang <82500463+yjoonjang@users.noreply.github.com>
Co-authored-by: 24September <puritysarah@naver.com>
Co-authored-by: Jan Karaś <90987511+KTFish@users.noreply.github.com>
Co-authored-by: Shuu <136542198+Shun0212@users.noreply.github.com>
Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com>
Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
KennethEnevoldsen added a commit that referenced this pull request May 19, 2025
Fixes mistakes introduced in #2424

It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened?
KennethEnevoldsen added a commit that referenced this pull request May 19, 2025
…e it (#2706)

Fixes mistakes introduced in #2424

It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened?
isaac-chung added a commit that referenced this pull request Jun 22, 2025
* move icon & name to benchmark dataclass (#2573)

* Remove the comments from ImageEncoder (#2579)

* fix: Add Encodechka benchmark (#2561)

* add tasks

* add benchmark

* fix imports

* update stsb split

* Update tasks table

* 1.38.2

Automatically generated by python-semantic-release

* fix FlagEmbedding package name (#2588)

* fix codecarbon version (#2587)

* Add MIEB image only benchmark (#2590)

* add vision only bench

* add description

* correct zs task modalities

* specify tasks param

* Add image only MIEB benchmark to LB left panel (#2596)

* Update benchmarks.py

* make lint

* add to left side bar

* update Doubao-1.5-Embedding (#2575)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Add WebSSL models (#2604)

* add 2 web SSL dino models

* add models from collection and revisions

* update memory_usage_mb and embed dim

* use automodel instead

* fix mieb citation (#2606)

* 1.38.3

Automatically generated by python-semantic-release

* Update Doubao-1.5-Embedding (#2611)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* CI: update benchmark table (#2609)

* update benchmark table

* fix table

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update Doubao-1.5-Embedding revision (#2613)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* CI: fix table  (#2615)

* Update tasks & benchmarks tables

* Update gradio version (#2558)

* Update gradio version

Closes #2557

* bump gradio

* fix: Removed missing dataset for MTEB(Multilingual) and bumped version

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* CI: fix infinitely committing issue (#2616)

* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ScandiSent dataset (#2620)

* add scandisent dataset

* add to init

* typo

* lint

* 1.38.4

Automatically generated by python-semantic-release

* Format all citations (#2614)

* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script

* fix citations (#2628)

* Add Talemaader pair classification task (#2621)

Add talemaader pair classification task

* add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633)

* add Bilingual English-Danish parallel corpus from The Danish Medicines Agency

* bump dataset revision

* format bibtex

* format bibtex

* Remove irrelevant test (#2630)

remove irrelevant test

* Revert "CI: fix infinitely committing issue (#2616)" (#2636)

This reverts commit 82dcb3d.

* Update tasks & benchmarks tables

* Remove `typer` dependency from citation script (#2629)

remove typer dependency from citation script

* CI format citations (#2649)

* ci format citations

* add files

* remove from lint CI

* test lint

* test lint

* fix names

* fix: Update VisualSTS Aggregate task modalities (#2597)

* Update STS17MultilingualVisualSTS.py

* fix STSBenchmarkMultilingualVisualSTS

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.38.5

Automatically generated by python-semantic-release

* Add tests for leaderboard build (#2631)

* Add tests for leaderboard build

* add new action

* remove build tests from other actions

* fix tests

* correct exclusion of test

* added timeout constant

* fix: SIB200 machine translated > human translated (#2665)

As correctly pointed out in:

https://huggingface.co/datasets/mteb/sib200/discussions/1

* 1.38.6

Automatically generated by python-semantic-release

* fix: Update datasets wich can't be loaded with `datasets>=3.0`  (#2661)

fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619)

* reupload datasets

* fix loader

* remove commented code

* lint

* update pyproject dependencies

* rename model RELLE to CHAIN19 (#2671)

* Add relle
* defined model metadata for relle

* Add mteb/models/relle_models.py

* Update mteb/models/relle_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* lint after commit

run after "make lint"

* Add into model_modules

Add model into model_modules and lint check

* rename model
change model name

* rename model
change model name

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.38.7

Automatically generated by python-semantic-release

* Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

* update Doubao-1.5-Embedding revision 3

* rename Doubao-1.5-Embedding to Seed1.5-Embedding

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Allow empty string for openai models (#2676)

* fix for empty string input to openai/text-embedding-3-large

* fix: Allow empty string in openai models

closes: #1650

* fix based on review

* Updated docstring

---------

Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in>

* 1.38.8

Automatically generated by python-semantic-release

* Leaderboard: UI simplifications for menus (#2672)

* Leaderboard: UI simplifications for menus

Did a few things to improve the simplify the leaderboard UI.

Changes:
- Combined FAQ entries
- Created dropdowns in the select benchmark menu sidebar
- Removed reference to arena
- Removed reference to old leaderboard
- reduced size of select menu
- reduced the size of acknowledgements
- removed farsi from the selection (as it is a beta)

refactors:
- refactored to use a class for menu items
- refactored texts segments out of app.py

* fixed comment

* fixes for sizes

* fix modality for `OVENIT2TRetrieval` (#2678)

fix modality

* fix: `MTEB(Code, v1)`  languages (#2679)

fix code languages

* 1.38.9

Automatically generated by python-semantic-release

* Correction in docs (#2688)

* Fix for Openai_Text-Embedding3-Small (#2702)

* Fix for Openai_Text-Embedding3-Small

* better syntax for readability

* Fix for Openai_Text-Embedding3-Small (#2702)

* Fix for Openai_Text-Embedding3-Small

* better syntax for readability

* fix: Ensure that optional dependencies are compatible and if not state it (#2706)

Fixes mistakes introduced in #2424

It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened?

* fix: Only install mteb into site packages (#2618)

* Restrict installation directory

* fix

* namespace false

* add star

* add pont

* fix import

* fix import

* add init files

* fix setuptools find

* fix image init

* add missing templates

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* 1.38.10

Automatically generated by python-semantic-release

* docs: Updated the PR template and improved submission docs (#2704)

* docs: Updated the PR template and improved submission docs

1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests.
2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable.
3) Required that you argue for a dataset before addition

fixes #2568

* Apply suggestions from code review

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: Remove models from the leaderboard (#2705)

* fix: Remove models from the leaderboard

I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public.

* format

* 1.38.11

Automatically generated by python-semantic-release

* fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711)

* Rename gemini-embedding-exp-03-07 to gemini-embedding-001

* update referenfe link to the vertexAI API doc

* 1.38.12

Automatically generated by python-semantic-release

* fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708)

* fix: Integrate `lightonai/GTE-ModernColBERT-v1`

Fixes #2673

* fixes based on corrections

* 1.38.13

Automatically generated by python-semantic-release

* docs: fix number of tasks for eng, v2 in docs (#2720)

* fix: Added potion-multilingual-128M (#2717)

* Added ModelMeta for potion-multilingual-128M

* Fixed linting

* Fixed linting

* Updated date

* 1.38.14

Automatically generated by python-semantic-release

* Update the max tokens for gemini-embedding-001 (#2725)

* fix: Ara and ben classification dataset cleaning (#2632)

* Improve classification datasets quality for ara and ben langs

* add missing AJGT

* fix format

* change ajgt description

* Fix numbers in description, add link to pull request

* Add too short filter

* Link in markdown format

* Update tasks & benchmarks tables

* fix: Update Seed1.5-Embedding API (#2724)

* update seed1.5-embedding api

* update seed1.5-embedding api

* update Seed1.5-Embedding API

* update Seed1.5-Embedding resolve comments

* update Seed1.5-Embedding lint

* Update mteb/models/seed_models.py

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.15

Automatically generated by python-semantic-release

* fix: Add vidore v2 benchmarks (#2713)

* adding vidore benchmarks

* fix typo

* clean vidore names + per lang eval

* lint

* vidore names

* bibtex fix

* fix revision

* vidore v2 citation

* update citation format and fix per-language mappings

* lint: citations

* typo citations

* Update tasks & benchmarks tables

* 1.38.16

Automatically generated by python-semantic-release

* fix: `IndicQARetrieval` loader (#2729)

* fix indic qa

* add kwargs

* 1.38.17

Automatically generated by python-semantic-release

* fix: Promote Persian benchmark to v1 (#2707)

* Switch versioning from beta to v1 and add v1 to benchmark selector

* Update Farsi benchmark display name, task IDs, and metadata

* Add Hakim Model

* fix hakim version

* update

* make lint

* fix: Promote Persian benchmark to v1

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update tasks & benchmarks tables

* 1.38.18

Automatically generated by python-semantic-release

* Add ViDoRe combined benchmark and add to leaderboard side panel (#2732)

* add ViDoRe combined benchmark and add to leaderboard side panel

* Update benchmark_selector.py

* Update tasks & benchmarks tables

* fix: Rename display name of VDR (#2734)

* Update tasks & benchmarks tables

* 1.38.19

Automatically generated by python-semantic-release

* fix: Add colpali models family (#2721)

* add colpali models

* add colpali as framework

* add colpali as framework

* update metadata and add colsmol

* ix typos

* account for revision

* add training data info and lint

* modify meta

* correct colmodels meta and add colnomic 7b

* fix typo in toml (colpali subdeps)

* refine colmodel loading and metadata

* 1.38.20

Automatically generated by python-semantic-release

* fix: Correct embedding dimension for bge-m3 (#2738)

Fixes #2735

* 1.38.21

Automatically generated by python-semantic-release

* docs: Updated description of FEVER (#2745)

* docs: Updated description of FEVER

Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2)

* minor

* Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755)

* big-patent

* allegro-reviews

* Update tasks & benchmarks tables

* Update Seed1.5 training data (#2749)

* update seed1.5 training data

* update seed1.5 training data

* fix: Update caltech101 (#2759)

* docs: Updated description of FEVER

Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2)

* fix: Update Caltech101 to different source

Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match:

### Old

```
{
  "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b",
  "task_name": "Caltech101",
  "mteb_version": "1.38.4",
  "scores": {
    "test": [
      {
        "accuracy": 0.897863,
```

### New
```
{
  "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686",
  "task_name": "Caltech101",
  "mteb_version": "1.38.4",
  "scores": {
    "test": [
      {
        "accuracy": 0.897929,
```

* 1.38.22

Automatically generated by python-semantic-release

* Add missing PatchCamelyon_labels.txt (#2756)

* ci: Delete cache in Model loading test only when model is loaded (#2761)

* only delete cache when model loaded

* testing it out

* fix: Add `cadet-embed-base-v1` (#2727)

* update

* update overview.py for models

* update

* update

* 1.38.23

Automatically generated by python-semantic-release

* Fixing Google embedding task type for STS (#2767)

The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types

* docs: Leaderboard simplifications (#2764)

* docs: Leaderboard simplifications

Simplified sidebar, notably:

1) Combined Language and Regional (since these are all languages)
2) Folded all (With Visual document retrieval then images start to take up a lot of space)
3) Removed legacy and instead added "Other" in language, where I moved "English Legacy"

I also restructured the code so that nesting is easier.

Is it also possible to create a seperate section (see dummy screenshot)

* refactor to reduce nesting

* format

* fix: add xet support (#2603)

* add xet version

* add doc comment

* change xet requirements

* Update docs/usage/usage.md

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.24

Automatically generated by python-semantic-release

* fix: Update giga embeddings (#2774)

* update giga embeddings

* update giga embeddings

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>

* ci: add new prefixes to releases (#2766)

add new prefixes

* 1.38.25

Automatically generated by python-semantic-release

* fix: Update Caltech101 datasets to latest revision [v1] (#2778)

* fix: Update Caltech101 datasets to latest revision [v2]

 fixes: #2770
Fixes the issue, but only in v1

```
# tested using:

task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot")
task.load_data()
task.get_candidate_labels()
```

* fix rev

* 1.38.26

Automatically generated by python-semantic-release

* fix: CachedEmbeddingWrapper issues in both documentation and code (#2779)

Fixes #2772

* 1.38.27

Automatically generated by python-semantic-release

* dataset: Add miracl vision (#2736)

* add miracl vision

* add miracl vision

* ruff

* cast

* image

* image

* add langs

* add langs

* add langs

* add langs

* descriptive stats

* lint

* lint

* lint

* remove com

* Update tasks & benchmarks tables

* model: Add Qwen3 Embedding model (#2769)

* Init code

* Remove extra config and lint code

* use sentence transformer

* add revisions

* fix lint

* Apply suggestions from code review

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix lint

* add framework

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* bump ruff (#2784)

* Update issue and pr templates (#2782)

* Update issue templates

* Update bug_report.md

* test yaml template

* add templates

* update templates

* add emojis

* fix typo

* Apply suggestions from code review

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* update issue titles

* update PR template

* remove PR templates

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* model: Add GeoGPT-Research-Project/GeoEmbedding (#2773)

* add model: geogpt_models

* update geogpt_models

* use InstructSentenceTransformerWrapper

* resolve pylint warning

* format geogpt_models.py

* Update mteb/models/geogpt_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/geogpt_models.py

---------

Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* model: add fangxq/XYZ-embedding (#2741)

* add xyz model

* add xyz model

* add xyz model

* update

* update

* update

* update

* update

* update

* update

* lint

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* ci: fix config error for semantic release (#2800)

discussed in: #2796

* dataset: Add R2MED Benchmark (#2795)

* Add files via upload

* Add files via upload

* Update benchmarks.py

* Update __init__.py

* Add files via upload

* Update R2MEDRetrieval.py

* Update run_mteb_r2med.py

* Delete scripts/run_mteb_r2med.py

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Add files via upload

* Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json

* Add files via upload

* Add files via upload

* Add files via upload

* Update R2MEDRetrieval.py

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* format citations

* Update R2MEDRetrieval.py

* Add files via upload

* Add files via upload

---------

Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802)

update training datasets

Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>

* fix: Add adapted_from to Cmedqaretrieval (#2806)

* fix: Add adapted_from to Cmedqaretrieval

Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places.

* format

* 1.38.28

Automatically generated by python-semantic-release

* fix: Adding client arg to init method of OpenAI models wrapper (#2803)

* Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client)

To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client.

* Update mteb/models/openai_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/openai_models.py

* remove comment and format

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* model: Add annamodels/LGAI-Embedding-Preview (#2810)

Add LGAI-Embedding

- Add mteb/models/lgai_embedding_models.py

- defined model metadata

* fix: Ensure bright uses the correct revision (#2812)

fixes #2811

* 1.38.29

Automatically generated by python-semantic-release

* add description to issue template (#2817)

* add description to template

* fix typo

* model: Added 3 HIT-TMG's KaLM-embedding models (#2478)

* Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper

* Added KaLM_embedding_multilingual_mini_instruct_v1_5

* Added model to overview.py

* Fix Task Count Per Language Table in tasks.md

* resolve conflicts

* remove tasks.md

* Modified get_instruction funcion

* Added support for prompt dict in get_instruction

* fix lang code

* Address comments

* Delete mteb/models/check_models.py

* added prompts_dict support in InstructSentenceTransformerWrapper

* corrected instruction format

* corrected prompts format

* added correct instruction format

* fix implementation

* remove `if name main`

* add comment

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* fix: Reuploaded previously unavailable SNL datasets (#2819)

* fix: Reuploaded previously unavailable SNL datasets

closes #2477

* removed exceptions from tests

* temp fixes

* added temporary fix

* clean up commented out code

* format

* Update tasks & benchmarks tables

* 1.38.30

Automatically generated by python-semantic-release

* docs: Fix some typos in `docs/usage/usage.md` (#2835)

* Update usage.md

* Update usage.md

* Update docs/usage/usage.md

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* model: Add custom instructions for GigaEmbeddings (#2836)

* add custom instructions

* fixed

* lint

* fix last instruction

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* try adding init

* add init in audio pc task eng

* all audio tasks init

* remove script test

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com>
Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: 24September <puritysarah@naver.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Feiyang <feiyangc@google.com>
Co-authored-by: Thomas van Dongen <thomas123@live.nl>
Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com>
Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com>
Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com>
Co-authored-by: Wentao Wu <wuwentao137@gmail.com>
Co-authored-by: Manveer Tamber <manveertamber@gmail.com>
Co-authored-by: malteos <github@i.mieo.de>
Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com>
Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com>
Co-authored-by: Xin Zhang <izhx404@gmail.com>
Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com>
Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>
Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com>
Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com>
Co-authored-by: annamodels <annamodels@lgresearch.ai>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
isaac-chung added a commit that referenced this pull request Jul 6, 2025
* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update Doubao-1.5-Embedding revision (#2613)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* CI: fix table  (#2615)

* Update tasks & benchmarks tables

* Update gradio version (#2558)

* Update gradio version

Closes #2557

* bump gradio

* fix: Removed missing dataset for MTEB(Multilingual) and bumped version

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* CI: fix infinitely committing issue (#2616)

* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ScandiSent dataset (#2620)

* add scandisent dataset

* add to init

* typo

* lint

* 1.38.4

Automatically generated by python-semantic-release

* Format all citations (#2614)

* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script

* fix citations (#2628)

* Add Talemaader pair classification task (#2621)

Add talemaader pair classification task

* add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633)

* add Bilingual English-Danish parallel corpus from The Danish Medicines Agency

* bump dataset revision

* format bibtex

* format bibtex

* Remove irrelevant test (#2630)

remove irrelevant test

* Revert "CI: fix infinitely committing issue (#2616)" (#2636)

This reverts commit 82dcb3d.

* Update tasks & benchmarks tables

* Remove `typer` dependency from citation script (#2629)

remove typer dependency from citation script

* CI format citations (#2649)

* ci format citations

* add files

* remove from lint CI

* test lint

* test lint

* fix names

* fix: Update VisualSTS Aggregate task modalities (#2597)

* Update STS17MultilingualVisualSTS.py

* fix STSBenchmarkMultilingualVisualSTS

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.38.5

Automatically generated by python-semantic-release

* Add tests for leaderboard build (#2631)

* Add tests for leaderboard build

* add new action

* remove build tests from other actions

* fix tests

* correct exclusion of test

* added timeout constant

* fix: SIB200 machine translated > human translated (#2665)

As correctly pointed out in:

https://huggingface.co/datasets/mteb/sib200/discussions/1

* 1.38.6

Automatically generated by python-semantic-release

* fix: Update datasets wich can't be loaded with `datasets>=3.0`  (#2661)

fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619)

* reupload datasets

* fix loader

* remove commented code

* lint

* update pyproject dependencies

* rename model RELLE to CHAIN19 (#2671)

* Add relle
* defined model metadata for relle

* Add mteb/models/relle_models.py

* Update mteb/models/relle_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* lint after commit

run after "make lint"

* Add into model_modules

Add model into model_modules and lint check

* rename model
change model name

* rename model
change model name

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.38.7

Automatically generated by python-semantic-release

* Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

* update Doubao-1.5-Embedding revision 3

* rename Doubao-1.5-Embedding to Seed1.5-Embedding

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Allow empty string for openai models (#2676)

* fix for empty string input to openai/text-embedding-3-large

* fix: Allow empty string in openai models

closes: #1650

* fix based on review

* Updated docstring

---------

Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in>

* 1.38.8

Automatically generated by python-semantic-release

* Leaderboard: UI simplifications for menus (#2672)

* Leaderboard: UI simplifications for menus

Did a few things to improve the simplify the leaderboard UI.

Changes:
- Combined FAQ entries
- Created dropdowns in the select benchmark menu sidebar
- Removed reference to arena
- Removed reference to old leaderboard
- reduced size of select menu
- reduced the size of acknowledgements
- removed farsi from the selection (as it is a beta)

refactors:
- refactored to use a class for menu items
- refactored texts segments out of app.py

* fixed comment

* fixes for sizes

* fix modality for `OVENIT2TRetrieval` (#2678)

fix modality

* fix: `MTEB(Code, v1)`  languages (#2679)

fix code languages

* 1.38.9

Automatically generated by python-semantic-release

* Correction in docs (#2688)

* Fix for Openai_Text-Embedding3-Small (#2702)

* Fix for Openai_Text-Embedding3-Small

* better syntax for readability

* Fix for Openai_Text-Embedding3-Small (#2702)

* Fix for Openai_Text-Embedding3-Small

* better syntax for readability

* fix: Ensure that optional dependencies are compatible and if not state it (#2706)

Fixes mistakes introduced in #2424

It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened?

* fix: Only install mteb into site packages (#2618)

* Restrict installation directory

* fix

* namespace false

* add star

* add pont

* fix import

* fix import

* add init files

* fix setuptools find

* fix image init

* add missing templates

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* 1.38.10

Automatically generated by python-semantic-release

* docs: Updated the PR template and improved submission docs (#2704)

* docs: Updated the PR template and improved submission docs

1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests.
2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable.
3) Required that you argue for a dataset before addition

fixes #2568

* Apply suggestions from code review

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: Remove models from the leaderboard (#2705)

* fix: Remove models from the leaderboard

I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public.

* format

* 1.38.11

Automatically generated by python-semantic-release

* fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711)

* Rename gemini-embedding-exp-03-07 to gemini-embedding-001

* update referenfe link to the vertexAI API doc

* 1.38.12

Automatically generated by python-semantic-release

* fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708)

* fix: Integrate `lightonai/GTE-ModernColBERT-v1`

Fixes #2673

* fixes based on corrections

* 1.38.13

Automatically generated by python-semantic-release

* docs: fix number of tasks for eng, v2 in docs (#2720)

* fix: Added potion-multilingual-128M (#2717)

* Added ModelMeta for potion-multilingual-128M

* Fixed linting

* Fixed linting

* Updated date

* 1.38.14

Automatically generated by python-semantic-release

* Update the max tokens for gemini-embedding-001 (#2725)

* fix: Ara and ben classification dataset cleaning (#2632)

* Improve classification datasets quality for ara and ben langs

* add missing AJGT

* fix format

* change ajgt description

* Fix numbers in description, add link to pull request

* Add too short filter

* Link in markdown format

* Update tasks & benchmarks tables

* fix: Update Seed1.5-Embedding API (#2724)

* update seed1.5-embedding api

* update seed1.5-embedding api

* update Seed1.5-Embedding API

* update Seed1.5-Embedding resolve comments

* update Seed1.5-Embedding lint

* Update mteb/models/seed_models.py

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.15

Automatically generated by python-semantic-release

* fix: Add vidore v2 benchmarks (#2713)

* adding vidore benchmarks

* fix typo

* clean vidore names + per lang eval

* lint

* vidore names

* bibtex fix

* fix revision

* vidore v2 citation

* update citation format and fix per-language mappings

* lint: citations

* typo citations

* Update tasks & benchmarks tables

* 1.38.16

Automatically generated by python-semantic-release

* fix: `IndicQARetrieval` loader (#2729)

* fix indic qa

* add kwargs

* 1.38.17

Automatically generated by python-semantic-release

* fix: Promote Persian benchmark to v1 (#2707)

* Switch versioning from beta to v1 and add v1 to benchmark selector

* Update Farsi benchmark display name, task IDs, and metadata

* Add Hakim Model

* fix hakim version

* update

* make lint

* fix: Promote Persian benchmark to v1

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update tasks & benchmarks tables

* 1.38.18

Automatically generated by python-semantic-release

* Add ViDoRe combined benchmark and add to leaderboard side panel (#2732)

* add ViDoRe combined benchmark and add to leaderboard side panel

* Update benchmark_selector.py

* Update tasks & benchmarks tables

* fix: Rename display name of VDR (#2734)

* Update tasks & benchmarks tables

* 1.38.19

Automatically generated by python-semantic-release

* fix: Add colpali models family (#2721)

* add colpali models

* add colpali as framework

* add colpali as framework

* update metadata and add colsmol

* ix typos

* account for revision

* add training data info and lint

* modify meta

* correct colmodels meta and add colnomic 7b

* fix typo in toml (colpali subdeps)

* refine colmodel loading and metadata

* 1.38.20

Automatically generated by python-semantic-release

* fix: Correct embedding dimension for bge-m3 (#2738)

Fixes #2735

* 1.38.21

Automatically generated by python-semantic-release

* docs: Updated description of FEVER (#2745)

* docs: Updated description of FEVER

Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2)

* minor

* Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755)

* big-patent

* allegro-reviews

* Update tasks & benchmarks tables

* Update Seed1.5 training data (#2749)

* update seed1.5 training data

* update seed1.5 training data

* fix: Update caltech101 (#2759)

* docs: Updated description of FEVER

Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2)

* fix: Update Caltech101 to different source

Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match:

### Old

```
{
  "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b",
  "task_name": "Caltech101",
  "mteb_version": "1.38.4",
  "scores": {
    "test": [
      {
        "accuracy": 0.897863,
```

### New
```
{
  "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686",
  "task_name": "Caltech101",
  "mteb_version": "1.38.4",
  "scores": {
    "test": [
      {
        "accuracy": 0.897929,
```

* 1.38.22

Automatically generated by python-semantic-release

* Add missing PatchCamelyon_labels.txt (#2756)

* ci: Delete cache in Model loading test only when model is loaded (#2761)

* only delete cache when model loaded

* testing it out

* fix: Add `cadet-embed-base-v1` (#2727)

* update

* update overview.py for models

* update

* update

* 1.38.23

Automatically generated by python-semantic-release

* Fixing Google embedding task type for STS (#2767)

The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types

* docs: Leaderboard simplifications (#2764)

* docs: Leaderboard simplifications

Simplified sidebar, notably:

1) Combined Language and Regional (since these are all languages)
2) Folded all (With Visual document retrieval then images start to take up a lot of space)
3) Removed legacy and instead added "Other" in language, where I moved "English Legacy"

I also restructured the code so that nesting is easier.

Is it also possible to create a seperate section (see dummy screenshot)

* refactor to reduce nesting

* format

* fix: add xet support (#2603)

* add xet version

* add doc comment

* change xet requirements

* Update docs/usage/usage.md

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.24

Automatically generated by python-semantic-release

* fix: Update giga embeddings (#2774)

* update giga embeddings

* update giga embeddings

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>

* ci: add new prefixes to releases (#2766)

add new prefixes

* 1.38.25

Automatically generated by python-semantic-release

* fix: Update Caltech101 datasets to latest revision [v1] (#2778)

* fix: Update Caltech101 datasets to latest revision [v2]

 fixes: #2770
Fixes the issue, but only in v1

```
# tested using:

task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot")
task.load_data()
task.get_candidate_labels()
```

* fix rev

* 1.38.26

Automatically generated by python-semantic-release

* fix: CachedEmbeddingWrapper issues in both documentation and code (#2779)

Fixes #2772

* 1.38.27

Automatically generated by python-semantic-release

* dataset: Add miracl vision (#2736)

* add miracl vision

* add miracl vision

* ruff

* cast

* image

* image

* add langs

* add langs

* add langs

* add langs

* descriptive stats

* lint

* lint

* lint

* remove com

* Update tasks & benchmarks tables

* model: Add Qwen3 Embedding model (#2769)

* Init code

* Remove extra config and lint code

* use sentence transformer

* add revisions

* fix lint

* Apply suggestions from code review

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix lint

* add framework

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* bump ruff (#2784)

* Update issue and pr templates (#2782)

* Update issue templates

* Update bug_report.md

* test yaml template

* add templates

* update templates

* add emojis

* fix typo

* Apply suggestions from code review

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* update issue titles

* update PR template

* remove PR templates

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* model: Add GeoGPT-Research-Project/GeoEmbedding (#2773)

* add model: geogpt_models

* update geogpt_models

* use InstructSentenceTransformerWrapper

* resolve pylint warning

* format geogpt_models.py

* Update mteb/models/geogpt_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/geogpt_models.py

---------

Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* model: add fangxq/XYZ-embedding (#2741)

* add xyz model

* add xyz model

* add xyz model

* update

* update

* update

* update

* update

* update

* update

* lint

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* ci: fix config error for semantic release (#2800)

discussed in: #2796

* dataset: Add R2MED Benchmark (#2795)

* Add files via upload

* Add files via upload

* Update benchmarks.py

* Update __init__.py

* Add files via upload

* Update R2MEDRetrieval.py

* Update run_mteb_r2med.py

* Delete scripts/run_mteb_r2med.py

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Add files via upload

* Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json

* Add files via upload

* Add files via upload

* Add files via upload

* Update R2MEDRetrieval.py

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* format citations

* Update R2MEDRetrieval.py

* Add files via upload

* Add files via upload

---------

Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802)

update training datasets

Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>

* fix: Add adapted_from to Cmedqaretrieval (#2806)

* fix: Add adapted_from to Cmedqaretrieval

Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places.

* format

* 1.38.28

Automatically generated by python-semantic-release

* fix: Adding client arg to init method of OpenAI models wrapper (#2803)

* Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client)

To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client.

* Update mteb/models/openai_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/openai_models.py

* remove comment and format

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* model: Add annamodels/LGAI-Embedding-Preview (#2810)

Add LGAI-Embedding

- Add mteb/models/lgai_embedding_models.py

- defined model metadata

* fix: Ensure bright uses the correct revision (#2812)

fixes #2811

* 1.38.29

Automatically generated by python-semantic-release

* add description to issue template (#2817)

* add description to template

* fix typo

* model: Added 3 HIT-TMG's KaLM-embedding models (#2478)

* Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper

* Added KaLM_embedding_multilingual_mini_instruct_v1_5

* Added model to overview.py

* Fix Task Count Per Language Table in tasks.md

* resolve conflicts

* remove tasks.md

* Modified get_instruction funcion

* Added support for prompt dict in get_instruction

* fix lang code

* Address comments

* Delete mteb/models/check_models.py

* added prompts_dict support in InstructSentenceTransformerWrapper

* corrected instruction format

* corrected prompts format

* added correct instruction format

* fix implementation

* remove `if name main`

* add comment

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* fix: Reuploaded previously unavailable SNL datasets (#2819)

* fix: Reuploaded previously unavailable SNL datasets

closes #2477

* removed exceptions from tests

* temp fixes

* added temporary fix

* clean up commented out code

* format

* Update tasks & benchmarks tables

* 1.38.30

Automatically generated by python-semantic-release

* docs: Fix some typos in `docs/usage/usage.md` (#2835)

* Update usage.md

* Update usage.md

* Update docs/usage/usage.md

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* model: Add custom instructions for GigaEmbeddings (#2836)

* add custom instructions

* fixed

* lint

* fix last instruction

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* model: add Seed-1.6-embedding model (#2841)

* add Seed-1.6-embedding model

* Update seed_1_6_embedding_models.py

* update model meta info

* support image encoder interface

* error fix

* fix: format seed_1_6_embedding_models.py with Ruff

* fix: Update model selection for the leaderboard (#2855)

* fix: Update model selection for the leaderboard

fixes #2834

This removed the lower bound selection, but generally I don't think people should care about the models being too small.

* fix 1M --> 1B

* format

* rename model_size -> max_model_size

* 1.38.31

Automatically generated by python-semantic-release

* fix: update training dataset info of Seed-1.6-embedding model  (#2857)

update seed1.6 model training data info

* 1.38.32

Automatically generated by python-semantic-release

* add jinav4 model meta (#2858)

* add model meta

* linting

* fix: add check for code lora

* fix: apply review comments

* fix: prompt validation for tasks with `-` (#2846)

* fix prompt validation

* fix task name split correctly

* add docstring for test

* 1.38.33

Automatically generated by python-semantic-release

* model: Adding Sailesh97/Hinvec (#2842)

* Adding Hinvec Model's Meta data.

* Adding hinvec_model.py

* Update mteb/models/hinvec_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* formated code with Black and lint with Ruff

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Bump gradio to fix leaderboard sorting (#2866)

Bump gradio

* model: Adding nvidia/llama-nemoretriever-colembed models (#2861)

* nvidia_llama_nemoretriever_colembed

* correct 3b reference

* lint fix

* add training data and license for nvidia/llama_nemoretriever_colembed

* lint

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* rename seed-1.6-embedding to seed1.6-embedding (#2870)

* fix tests to be compatible with `SentenceTransformers` `v5` (#2875)

* fix sbert `v5`

* add comment

* model: add listconranker modelmeta (#2874)

* add listconranker modelmeta

* fix bugs

* use linter

* lint

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* model: add kalm_models ModelMeta (new PR) (#2853)

* feat: add KaLM_Embedding_X_0605 in kalm_models

* Update kalm_models.py for lint format

---------

Co-authored-by: xinshuohu <xinshuohu@tencent.com>

* Comment kalm model (#2877)

comment kalm model

* Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)

* Add JaCWIR and JQaRA for reranking

* Fix ANLP Journal datasets

* Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval

* tackle test cases

* Remove _evaluate_subset usage

* Separate v1 and v2

* Update info for NLP Journal datasets

* Update tasks & benchmarks tables

* model: add Hakim and TookaSBERTV2 models (#2826)

* add tooka v2s

* add mcinext models

* update mcinext.py

* Apply PR review suggestions

* Update mteb/models/mcinext_models.py

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com>
Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: 24September <puritysarah@naver.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Feiyang <feiyangc@google.com>
Co-authored-by: Thomas van Dongen <thomas123@live.nl>
Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com>
Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com>
Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com>
Co-authored-by: Wentao Wu <wuwentao137@gmail.com>
Co-authored-by: Manveer Tamber <manveertamber@gmail.com>
Co-authored-by: malteos <github@i.mieo.de>
Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com>
Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com>
Co-authored-by: Xin Zhang <izhx404@gmail.com>
Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com>
Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>
Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com>
Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com>
Co-authored-by: annamodels <annamodels@lgresearch.ai>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
Co-authored-by: Quan Yuhan <929888357@qq.com>
Co-authored-by: Quan Yuhan <yuhan_quan@qq.com>
Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com>
Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com>
Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com>
Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com>
Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com>
Co-authored-by: xinshuohu <xinshuohu@tencent.com>
Co-authored-by: lsz05 <lszgz0521@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Samoed added a commit that referenced this pull request Jul 10, 2025
* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* CI: fix table  (#2615)

* Update tasks & benchmarks tables

* Update gradio version (#2558)

* Update gradio version

Closes #2557

* bump gradio

* fix: Removed missing dataset for MTEB(Multilingual) and bumped version

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* CI: fix infinitely committing issue (#2616)

* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ScandiSent dataset (#2620)

* add scandisent dataset

* add to init

* typo

* lint

* 1.38.4

Automatically generated by python-semantic-release

* Format all citations (#2614)

* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script

* fix citations (#2628)

* Add Talemaader pair classification task (#2621)

Add talemaader pair classification task

* add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633)

* add Bilingual English-Danish parallel corpus from The Danish Medicines Agency

* bump dataset revision

* format bibtex

* format bibtex

* Remove irrelevant test (#2630)

remove irrelevant test

* Revert "CI: fix infinitely committing issue (#2616)" (#2636)

This reverts commit 82dcb3d.

* Update tasks & benchmarks tables

* Remove `typer` dependency from citation script (#2629)

remove typer dependency from citation script

* CI format citations (#2649)

* ci format citations

* add files

* remove from lint CI

* test lint

* test lint

* fix names

* fix: Update VisualSTS Aggregate task modalities (#2597)

* Update STS17MultilingualVisualSTS.py

* fix STSBenchmarkMultilingualVisualSTS

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.38.5

Automatically generated by python-semantic-release

* Add tests for leaderboard build (#2631)

* Add tests for leaderboard build

* add new action

* remove build tests from other actions

* fix tests

* correct exclusion of test

* added timeout constant

* fix: SIB200 machine translated > human translated (#2665)

As correctly pointed out in:

https://huggingface.co/datasets/mteb/sib200/discussions/1

* 1.38.6

Automatically generated by python-semantic-release

* fix: Update datasets wich can't be loaded with `datasets>=3.0`  (#2661)

fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619)

* reupload datasets

* fix loader

* remove commented code

* lint

* update pyproject dependencies

* rename model RELLE to CHAIN19 (#2671)

* Add relle
* defined model metadata for relle

* Add mteb/models/relle_models.py

* Update mteb/models/relle_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* lint after commit

run after "make lint"

* Add into model_modules

Add model into model_modules and lint check

* rename model
change model name

* rename model
change model name

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.38.7

Automatically generated by python-semantic-release

* Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

* update Doubao-1.5-Embedding revision 3

* rename Doubao-1.5-Embedding to Seed1.5-Embedding

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Allow empty string for openai models (#2676)

* fix for empty string input to openai/text-embedding-3-large

* fix: Allow empty string in openai models

closes: #1650

* fix based on review

* Updated docstring

---------

Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in>

* 1.38.8

Automatically generated by python-semantic-release

* Leaderboard: UI simplifications for menus (#2672)

* Leaderboard: UI simplifications for menus

Did a few things to improve the simplify the leaderboard UI.

Changes:
- Combined FAQ entries
- Created dropdowns in the select benchmark menu sidebar
- Removed reference to arena
- Removed reference to old leaderboard
- reduced size of select menu
- reduced the size of acknowledgements
- removed farsi from the selection (as it is a beta)

refactors:
- refactored to use a class for menu items
- refactored texts segments out of app.py

* fixed comment

* fixes for sizes

* fix modality for `OVENIT2TRetrieval` (#2678)

fix modality

* fix: `MTEB(Code, v1)`  languages (#2679)

fix code languages

* 1.38.9

Automatically generated by python-semantic-release

* Correction in docs (#2688)

* Fix for Openai_Text-Embedding3-Small (#2702)

* Fix for Openai_Text-Embedding3-Small

* better syntax for readability

* Fix for Openai_Text-Embedding3-Small (#2702)

* Fix for Openai_Text-Embedding3-Small

* better syntax for readability

* fix: Ensure that optional dependencies are compatible and if not state it (#2706)

Fixes mistakes introduced in #2424

It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened?

* fix: Only install mteb into site packages (#2618)

* Restrict installation directory

* fix

* namespace false

* add star

* add pont

* fix import

* fix import

* add init files

* fix setuptools find

* fix image init

* add missing templates

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* 1.38.10

Automatically generated by python-semantic-release

* docs: Updated the PR template and improved submission docs (#2704)

* docs: Updated the PR template and improved submission docs

1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests.
2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable.
3) Required that you argue for a dataset before addition

fixes #2568

* Apply suggestions from code review

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: Remove models from the leaderboard (#2705)

* fix: Remove models from the leaderboard

I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public.

* format

* 1.38.11

Automatically generated by python-semantic-release

* fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711)

* Rename gemini-embedding-exp-03-07 to gemini-embedding-001

* update referenfe link to the vertexAI API doc

* 1.38.12

Automatically generated by python-semantic-release

* fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708)

* fix: Integrate `lightonai/GTE-ModernColBERT-v1`

Fixes #2673

* fixes based on corrections

* 1.38.13

Automatically generated by python-semantic-release

* docs: fix number of tasks for eng, v2 in docs (#2720)

* fix: Added potion-multilingual-128M (#2717)

* Added ModelMeta for potion-multilingual-128M

* Fixed linting

* Fixed linting

* Updated date

* 1.38.14

Automatically generated by python-semantic-release

* Update the max tokens for gemini-embedding-001 (#2725)

* fix: Ara and ben classification dataset cleaning (#2632)

* Improve classification datasets quality for ara and ben langs

* add missing AJGT

* fix format

* change ajgt description

* Fix numbers in description, add link to pull request

* Add too short filter

* Link in markdown format

* Update tasks & benchmarks tables

* fix: Update Seed1.5-Embedding API (#2724)

* update seed1.5-embedding api

* update seed1.5-embedding api

* update Seed1.5-Embedding API

* update Seed1.5-Embedding resolve comments

* update Seed1.5-Embedding lint

* Update mteb/models/seed_models.py

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.15

Automatically generated by python-semantic-release

* fix: Add vidore v2 benchmarks (#2713)

* adding vidore benchmarks

* fix typo

* clean vidore names + per lang eval

* lint

* vidore names

* bibtex fix

* fix revision

* vidore v2 citation

* update citation format and fix per-language mappings

* lint: citations

* typo citations

* Update tasks & benchmarks tables

* 1.38.16

Automatically generated by python-semantic-release

* fix: `IndicQARetrieval` loader (#2729)

* fix indic qa

* add kwargs

* 1.38.17

Automatically generated by python-semantic-release

* fix: Promote Persian benchmark to v1 (#2707)

* Switch versioning from beta to v1 and add v1 to benchmark selector

* Update Farsi benchmark display name, task IDs, and metadata

* Add Hakim Model

* fix hakim version

* update

* make lint

* fix: Promote Persian benchmark to v1

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update tasks & benchmarks tables

* 1.38.18

Automatically generated by python-semantic-release

* Add ViDoRe combined benchmark and add to leaderboard side panel (#2732)

* add ViDoRe combined benchmark and add to leaderboard side panel

* Update benchmark_selector.py

* Update tasks & benchmarks tables

* fix: Rename display name of VDR (#2734)

* Update tasks & benchmarks tables

* 1.38.19

Automatically generated by python-semantic-release

* fix: Add colpali models family (#2721)

* add colpali models

* add colpali as framework

* add colpali as framework

* update metadata and add colsmol

* ix typos

* account for revision

* add training data info and lint

* modify meta

* correct colmodels meta and add colnomic 7b

* fix typo in toml (colpali subdeps)

* refine colmodel loading and metadata

* 1.38.20

Automatically generated by python-semantic-release

* fix: Correct embedding dimension for bge-m3 (#2738)

Fixes #2735

* 1.38.21

Automatically generated by python-semantic-release

* docs: Updated description of FEVER (#2745)

* docs: Updated description of FEVER

Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2)

* minor

* Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755)

* big-patent

* allegro-reviews

* Update tasks & benchmarks tables

* Update Seed1.5 training data (#2749)

* update seed1.5 training data

* update seed1.5 training data

* fix: Update caltech101 (#2759)

* docs: Updated description of FEVER

Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2)

* fix: Update Caltech101 to different source

Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match:

### Old

```
{
  "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b",
  "task_name": "Caltech101",
  "mteb_version": "1.38.4",
  "scores": {
    "test": [
      {
        "accuracy": 0.897863,
```

### New
```
{
  "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686",
  "task_name": "Caltech101",
  "mteb_version": "1.38.4",
  "scores": {
    "test": [
      {
        "accuracy": 0.897929,
```

* 1.38.22

Automatically generated by python-semantic-release

* Add missing PatchCamelyon_labels.txt (#2756)

* ci: Delete cache in Model loading test only when model is loaded (#2761)

* only delete cache when model loaded

* testing it out

* fix: Add `cadet-embed-base-v1` (#2727)

* update

* update overview.py for models

* update

* update

* 1.38.23

Automatically generated by python-semantic-release

* Fixing Google embedding task type for STS (#2767)

The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types

* docs: Leaderboard simplifications (#2764)

* docs: Leaderboard simplifications

Simplified sidebar, notably:

1) Combined Language and Regional (since these are all languages)
2) Folded all (With Visual document retrieval then images start to take up a lot of space)
3) Removed legacy and instead added "Other" in language, where I moved "English Legacy"

I also restructured the code so that nesting is easier.

Is it also possible to create a seperate section (see dummy screenshot)

* refactor to reduce nesting

* format

* fix: add xet support (#2603)

* add xet version

* add doc comment

* change xet requirements

* Update docs/usage/usage.md

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.24

Automatically generated by python-semantic-release

* fix: Update giga embeddings (#2774)

* update giga embeddings

* update giga embeddings

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>

* ci: add new prefixes to releases (#2766)

add new prefixes

* 1.38.25

Automatically generated by python-semantic-release

* fix: Update Caltech101 datasets to latest revision [v1] (#2778)

* fix: Update Caltech101 datasets to latest revision [v2]

 fixes: #2770
Fixes the issue, but only in v1

```
# tested using:

task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot")
task.load_data()
task.get_candidate_labels()
```

* fix rev

* 1.38.26

Automatically generated by python-semantic-release

* fix: CachedEmbeddingWrapper issues in both documentation and code (#2779)

Fixes #2772

* 1.38.27

Automatically generated by python-semantic-release

* dataset: Add miracl vision (#2736)

* add miracl vision

* add miracl vision

* ruff

* cast

* image

* image

* add langs

* add langs

* add langs

* add langs

* descriptive stats

* lint

* lint

* lint

* remove com

* Update tasks & benchmarks tables

* model: Add Qwen3 Embedding model (#2769)

* Init code

* Remove extra config and lint code

* use sentence transformer

* add revisions

* fix lint

* Apply suggestions from code review

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix lint

* add framework

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* bump ruff (#2784)

* Update issue and pr templates (#2782)

* Update issue templates

* Update bug_report.md

* test yaml template

* add templates

* update templates

* add emojis

* fix typo

* Apply suggestions from code review

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* update issue titles

* update PR template

* remove PR templates

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* model: Add GeoGPT-Research-Project/GeoEmbedding (#2773)

* add model: geogpt_models

* update geogpt_models

* use InstructSentenceTransformerWrapper

* resolve pylint warning

* format geogpt_models.py

* Update mteb/models/geogpt_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/geogpt_models.py

---------

Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* model: add fangxq/XYZ-embedding (#2741)

* add xyz model

* add xyz model

* add xyz model

* update

* update

* update

* update

* update

* update

* update

* lint

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* ci: fix config error for semantic release (#2800)

discussed in: #2796

* dataset: Add R2MED Benchmark (#2795)

* Add files via upload

* Add files via upload

* Update benchmarks.py

* Update __init__.py

* Add files via upload

* Update R2MEDRetrieval.py

* Update run_mteb_r2med.py

* Delete scripts/run_mteb_r2med.py

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Add files via upload

* Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json

* Add files via upload

* Add files via upload

* Add files via upload

* Update R2MEDRetrieval.py

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* format citations

* Update R2MEDRetrieval.py

* Add files via upload

* Add files via upload

---------

Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802)

update training datasets

Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>

* fix: Add adapted_from to Cmedqaretrieval (#2806)

* fix: Add adapted_from to Cmedqaretrieval

Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places.

* format

* 1.38.28

Automatically generated by python-semantic-release

* fix: Adding client arg to init method of OpenAI models wrapper (#2803)

* Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client)

To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client.

* Update mteb/models/openai_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/openai_models.py

* remove comment and format

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* model: Add annamodels/LGAI-Embedding-Preview (#2810)

Add LGAI-Embedding

- Add mteb/models/lgai_embedding_models.py

- defined model metadata

* fix: Ensure bright uses the correct revision (#2812)

fixes #2811

* 1.38.29

Automatically generated by python-semantic-release

* add description to issue template (#2817)

* add description to template

* fix typo

* model: Added 3 HIT-TMG's KaLM-embedding models (#2478)

* Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper

* Added KaLM_embedding_multilingual_mini_instruct_v1_5

* Added model to overview.py

* Fix Task Count Per Language Table in tasks.md

* resolve conflicts

* remove tasks.md

* Modified get_instruction funcion

* Added support for prompt dict in get_instruction

* fix lang code

* Address comments

* Delete mteb/models/check_models.py

* added prompts_dict support in InstructSentenceTransformerWrapper

* corrected instruction format

* corrected prompts format

* added correct instruction format

* fix implementation

* remove `if name main`

* add comment

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* fix: Reuploaded previously unavailable SNL datasets (#2819)

* fix: Reuploaded previously unavailable SNL datasets

closes #2477

* removed exceptions from tests

* temp fixes

* added temporary fix

* clean up commented out code

* format

* Update tasks & benchmarks tables

* 1.38.30

Automatically generated by python-semantic-release

* docs: Fix some typos in `docs/usage/usage.md` (#2835)

* Update usage.md

* Update usage.md

* Update docs/usage/usage.md

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* model: Add custom instructions for GigaEmbeddings (#2836)

* add custom instructions

* fixed

* lint

* fix last instruction

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* model: add Seed-1.6-embedding model (#2841)

* add Seed-1.6-embedding model

* Update seed_1_6_embedding_models.py

* update model meta info

* support image encoder interface

* error fix

* fix: format seed_1_6_embedding_models.py with Ruff

* fix: Update model selection for the leaderboard (#2855)

* fix: Update model selection for the leaderboard

fixes #2834

This removed the lower bound selection, but generally I don't think people should care about the models being too small.

* fix 1M --> 1B

* format

* rename model_size -> max_model_size

* 1.38.31

Automatically generated by python-semantic-release

* fix: update training dataset info of Seed-1.6-embedding model  (#2857)

update seed1.6 model training data info

* 1.38.32

Automatically generated by python-semantic-release

* add jinav4 model meta (#2858)

* add model meta

* linting

* fix: add check for code lora

* fix: apply review comments

* fix: prompt validation for tasks with `-` (#2846)

* fix prompt validation

* fix task name split correctly

* add docstring for test

* 1.38.33

Automatically generated by python-semantic-release

* model: Adding Sailesh97/Hinvec (#2842)

* Adding Hinvec Model's Meta data.

* Adding hinvec_model.py

* Update mteb/models/hinvec_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* formated code with Black and lint with Ruff

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Bump gradio to fix leaderboard sorting (#2866)

Bump gradio

* model: Adding nvidia/llama-nemoretriever-colembed models (#2861)

* nvidia_llama_nemoretriever_colembed

* correct 3b reference

* lint fix

* add training data and license for nvidia/llama_nemoretriever_colembed

* lint

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* rename seed-1.6-embedding to seed1.6-embedding (#2870)

* fix tests to be compatible with `SentenceTransformers` `v5` (#2875)

* fix sbert `v5`

* add comment

* model: add listconranker modelmeta (#2874)

* add listconranker modelmeta

* fix bugs

* use linter

* lint

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* model: add kalm_models ModelMeta (new PR) (#2853)

* feat: add KaLM_Embedding_X_0605 in kalm_models

* Update kalm_models.py for lint format

---------

Co-authored-by: xinshuohu <xinshuohu@tencent.com>

* Comment kalm model (#2877)

comment kalm model

* Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)

* Add JaCWIR and JQaRA for reranking

* Fix ANLP Journal datasets

* Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval

* tackle test cases

* Remove _evaluate_subset usage

* Separate v1 and v2

* Update info for NLP Journal datasets

* Update tasks & benchmarks tables

* model: add Hakim and TookaSBERTV2 models (#2826)

* add tooka v2s

* add mcinext models

* update mcinext.py

* Apply PR review suggestions

* Update mteb/models/mcinext_models.py

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* dataset: Evalita dataset integration (#2859)

* Added DadoEvalCoarseClassification

* Removed unnecessary columns from DadoEvalCoarseClassification

* Added EmitClassification task

* added SardiStanceClassification task

* Added GeoLingItClassification task

* Added DisCoTexPairClassification tasks

* Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits

* changed import in DisCoTexPairClassification

* removed GeoLingItClassification dataset

* fixed citation formatting, missing metadata parameters and lint formatting

* - Added XGlueWRPReranking task
- Added missing __init__.py files

* fixed metadata in XGlueWRPReranking

* Added MKQARetrieval task

* fixed type in XGlueWRPReranking

* changed MKQARetrieval from  cross-lingual to monolingual

* formatted MKQARetrieval file

* removed unused const

---------

Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>

* Update tasks & benchmarks tables

* fix: pin datasets version (#2892)

fix datasets version

* 1.38.34

Automatically generated by python-semantic-release

* merge main

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: 24September <puritysarah@naver.com>
Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com>
Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Feiyang <feiyangc@google.com>
Co-authored-by: Thomas van Dongen <thomas123@live.nl>
Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com>
Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com>
Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com>
Co-authored-by: Wentao Wu <wuwentao137@gmail.com>
Co-authored-by: Manveer Tamber <manveertamber@gmail.com>
Co-authored-by: malteos <github@i.mieo.de>
Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com>
Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com>
Co-authored-by: Xin Zhang <izhx404@gmail.com>
Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com>
Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com>
Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com>
Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com>
Co-authored-by: annamodels <annamodels@lgresearch.ai>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
Co-authored-by: Quan Yuhan <929888357@qq.com>
Co-authored-by: Quan Yuhan <yuhan_quan@qq.com>
Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com>
Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com>
Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com>
Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com>
Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com>
Co-authored-by: xinshuohu <xinshuohu@tencent.com>
Co-authored-by: lsz05 <lszgz0521@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: MattiaSangermano <43407984+MattiaSangermano@users.noreply.github.com>
Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify dependencies for models in pyproject.toml
3 participants