Skip to content

Releases: embeddings-benchmark/mteb

1.38.43

20 Aug 18:51
Compare
Choose a tag to compare

1.38.43 (2025-08-20)

Ci

  • ci: Temporarily limit pytrec version to "pytrec-eval-terrier>=0.5.6, <0.5.8" to prevent errors

try to fix CI (6fa6efa)

Fix

  • fix: Add VN-MTEB benchmark and Leaderboard (#2995)

  • [ADD] 50 vietnamese dataset from vn-mteb

  • [UPDATE] task metadata

  • [UPDATE] import dependencies

  • [UPDATE] task metadata, bibtext citation

  • [UPDATE-TEST] test_model_meta

  • [UPDATE] sample_creation to machine-translated and LM verified

  • [ADD] sample creation machine-translated and LM verified

  • [ADD] VN-MTEB benchmark and leaderboard

  • [FIX] wrong benchmark name

  • [REMOVE] default fields metadata in Classfication tasks (0a6e855)

Unknown

  • Update tasks & benchmarks tables (def1377)

  • fix MBPPRetrieval revision (#3055)

Update MBPPRetrieval.py

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (ea801ec)

  • Update tasks & benchmarks tables (7da3cf9)

  • dataset: Added wikisql retrieval (#3039)

  • Add WikiSQL retrieval task

  • Code retrieval task based on WikiSQL natural language to SQL dataset
  • Natural language questions matched to SQL query implementations
  • Uses sql-Code evaluation language for SQL-specific metrics
  • Includes proper citations and descriptive statistics
  • Add WikiSQLRetrieval to imports

  • Add descriptive statistics for WikiSQLRetrieval

  • Reformatting

  • Reformatting

  • Reformatting, correcting the revision (7b289f5)

  • Update tasks & benchmarks tables (1fff5ce)

  • dataset: Add mbpp retrieval (#3037)

  • Add MBPP retrieval task

  • Code retrieval task based on 378 Python programming problems
  • Natural language queries matched to Python code implementations
  • Uses python-Code evaluation language for code-specific metrics
  • Includes proper citations and descriptive statistics
  • Add MBPPRetrieval to imports

  • Add descriptive statistics for MBPPRetrieval

  • Reformatting

  • Reformatting (ac69263)

  • Fix 3 VN-MTEB Pair Classification tasks (#3053)

  • [ADD] 50 vietnamese dataset from vn-mteb

  • [UPDATE] task metadata

  • [UPDATE] import dependencies

  • [UPDATE] task metadata, bibtext citation

  • [UPDATE-TEST] test_model_meta

  • [UPDATE] sample_creation to machine-translated and LM verified

  • [ADD] sample creation machine-translated and LM verified

  • [ADD] Vietnamese Embedding models

  • [REMOVE] default fields metadata in Classfication tasks

  • [UPDATE] model to vi-vn language specific file

  • [FIX] lint

  • [FIX] model loader

  • [FIX] VN-MTEB 3 datasets PairClassification rename column (4e3fcd8)

1.38.42

18 Aug 13:58
Compare
Choose a tag to compare

1.38.42 (2025-08-18)

Ci

  • ci: Updating rerun delays to prevent false positives errors (e476dc3)

  • ci: reduce parallel runs for when checking if a dataset exists (#3035)

The hope is that this will prevent many of the current errors (4aaf47e)

Fix

  • fix: Updated revision for jina-embeddings-v4 (#3046)

  • fix: jinav4 revision

Signed-off-by: admin <bo.wang@jina.ai>

  • change revision instead of removing it

Signed-off-by: admin <bo.wang@jina.ai>


Signed-off-by: admin <bo.wang@jina.ai>
Co-authored-by: admin <bo.wang@jina.ai> (c58b319)

Unknown

  • model: add granite-embedding-english R2 models (#3050) (e08ec56)

  • model: Add GreenNode Vietnamese Embedding models (#2994)

  • [ADD] 50 vietnamese dataset from vn-mteb

  • [UPDATE] task metadata

  • [UPDATE] import dependencies

  • [UPDATE] task metadata, bibtext citation

  • [UPDATE-TEST] test_model_meta

  • [UPDATE] sample_creation to machine-translated and LM verified

  • [ADD] sample creation machine-translated and LM verified

  • [ADD] Vietnamese Embedding models

  • [REMOVE] default fields metadata in Classfication tasks

  • [UPDATE] model to vi-vn language specific file

  • [FIX] lint

  • [FIX] model loader (72f7b05)

  • Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (d729d32)

1.38.41

17 Aug 09:00
Compare
Choose a tag to compare

1.38.41 (2025-08-17)

Fix

  • fix: incorrect revision for SNLRetrieval (#3033)

The provided revisions doesn't seem to be present on:
adrlau/navjordj-SNL_summarization_copy

Replacing with latest revision (5c65913)

Unknown

  • Update tasks & benchmarks tables (a96f2e4)

  • dataset: Add HumanEvalRetrieval task (#3022)

  • Add HumanEvalRetrieval dataset

  • Fix TaskMetadata structure and remove descriptive_stats

  • Use TaskMetadata class instead of dict
  • Remove descriptive_stats as requested in PR review
  • Add date field and proper import structure
  • Fix dataset path and use verified metadata
  • Change path from zeroshot/humaneval-embedding-benchmark to embedding-benchmark/HumanEval
  • Use actual description from HuggingFace dataset page
  • Remove fabricated citation and reference
  • Remove revision field that was incorrect
  • Reference HuggingFace dataset page instead of arxiv
  • Add correct revision hash to HumanEval
  • Add revision hash: ed1f48a for reproducibility
  • Fix HumanEval metadata validation
  • Add date field for metadata completeness
  • Add bibtex_citation field (empty string)
  • Required for TaskMetadata validation to pass
  • Should resolve PR test failure
  • Address reviewer feedback
  • Remove trust_remote_code parameter as requested
  • Add revision parameter to load_dataset() calls for consistency
  • Use metadata revision hash in dataset loading for reproducibility
  • Fix field names in HumanEval dataset loading

Changed query_id/corpus_id to query-id/corpus-id to match actual dataset format.

  • Fix deprecated metadata_dict usage

Use self.metadata.dataset instead of self.metadata_dict for v2.0 compatibility.

  • Fix data structure for MTEB compatibility
  • Organize data by splits as expected by MTEB retrieval tasks
  • Convert scores to integers for pytrec_eval compatibility
  • Address PR feedback for HumanEval dataset
  • Add descriptive statistics using calculate_metadata_metrics()
  • Enhance metadata description with dataset structure details
  • Add complete BibTeX citation for original paper
  • Update to full commit hash revision
  • Add python-Code language tag for programming language
  • Explain retrieval task formulation clearly
  • Fix BibTeX citation formatting for HumanEvalRetrieval
  • Update citation to match bibtexparser formatting requirements
  • Fields now in alphabetical order with lowercase names
  • Proper trailing commas and indentation (d4e6223)
  • model: Add granite-vision-embedding model (#3029)

  • Add files via upload

  • Address review comments

  • Address review comments

  • ruff format

  • Update mteb/models/granite_vision_embedding_models.py

  • lint error fix


Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (37d115a)

  • model: Add samilpwc_models meta (#3028)

  • model: Add samilpwc_models meta

  • Fix: Remove CONST

  • Fix: Reformat File

  • Update: model revision (96a7cc5)

1.38.40

16 Aug 14:31
Compare
Choose a tag to compare

1.38.40 (2025-08-16)

Fix

  • fix: Add missing training sets for qzhou (#3023)

  • Supplement missing training sets

  • reformat code

  • Reorganize the data list format

  • update qzhou_model meta (20bc80c)

Unknown

  • Update tasks & benchmarks tables (177997f)

  • Standardise task names and fix citation formatting (#3026)

fixes for name formatting (ea41e7a)

  • Add OpenAI models with 512 dimension (#3008)

  • Add OpenAI/text-embedding-3-small (512 dim)
    Add OpenAI/text-embedding-3-large (512 dim)

  • Correcting due to comments


Co-authored-by: fzowl <zoltan@voyageai.com> (d8b2910)

  • model: Add Cohere embed-v4.0 model support (#3006)

  • Add Cohere embed-v4.0 model support

  • Add text-only embed-v4.0 model in cohere_models.py
  • Add multimodal embed-v4.0 model in cohere_v.py
  • Support configurable dimensions (256, 512, 1024, 1536)
  • Support 128,000 token context length
  • Support multimodal embedding (text, images, mixed PDFs)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

  • Add Cohere embed-v4.0 model support

Update cohere_v.py and cohere_models.py to include the new embed-v4.0 model with proper configuration and integration.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>


Co-authored-by: Claude <noreply@anthropic.com> (87eb27c)

  • Update tasks & benchmarks tables (4adf565)

  • dataset: Added 50 Vietnamese dataset from vn-mteb (#2964)

  • [ADD] 50 vietnamese dataset from vn-mteb

  • [UPDATE] task metadata

  • [UPDATE] import dependencies

  • [UPDATE] task metadata, bibtext citation

  • [UPDATE-TEST] test_model_meta

  • [UPDATE] sample_creation to machine-translated and LM verified

  • [ADD] sample creation machine-translated and LM verified

  • [REMOVE] default fields metadata in Classfication tasks (741b022)

  • lint: Correcting lint errors (#3004)

  • Adding Classification Evaluator test

  • Modifications due to the comments

  • Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Modifications due to the comments

  • Modifications due to the comments

  • Correcting the lint errors


Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (01840ce)

  • model: BAAI/bge-m3-unsupervised Model (#3007)

  • Add BAAI/bge-m3-unsupervised Model
    (BAAI/bge_m3_retromae is commented out - the details are proper, but it fails during loading the model for me, so i commented out)

  • Remove the commented retromae model


Co-authored-by: fzowl <zoltan@voyageai.com> (042db73)

  • model: Add Voyage 3.5 model configuration (#3005)

Add Voyage 3.5 model configuration

  • Add voyage_3_5 ModelMeta with 1024 embed dimensions and 32000 max tokens
  • Set release date to 2025-01-21 with revision 1
  • Configure for cosine similarity with instruction support
  • Include standard Voyage training datasets reference

🤖 Generated with Claude Code

Co-authored-by: Claude <noreply@anthropic.com> (e5d386b)

  • qzhou-embedding model_meta & implementation (#2975)

  • qzhou-embedding model_meta & implementation

  • Update qzhou_models.py

  • Update qzhou_models.py

Processing todo items(Add default instruction)

  • Update qzhou_models.py

correct bge datalist

  • Update qzhou_models.py

correct 'public_training_data'

  • Update qzhou_models.py

  • Update qzhou_models.py

  • Update qzhou_models.py

  • Update qzhou_models.py

  • Update mteb/models/qzhou_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

  • Update mteb/models/qzhou_models.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • format qzhou_models.py for ruff check

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (6c1f1c6)

1.38.39

03 Aug 06:55
Compare
Choose a tag to compare

1.38.39 (2025-08-03)

Fix

  • fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression (#2716)

  • Add RuSciBench

  • fix bitext mining lang

  • Add regression task

  • fix init

  • add missing files

  • Improve description

  • Add superseded_by

  • fix lint

  • Update regression task to match with v2

  • Add stratified_subsampling for regression task

  • Add boostrap for regression task

  • Rename task class, add model as evaluator argument

  • fix import

  • fix import 2

  • fixes

  • fix

  • Rename regression model protocol (36df9ca)

Unknown

  • Update tasks & benchmarks tables (a86e2dd)

  • Update tasks & benchmarks tables (e4f30e9)

  • dataset: add BillSum datasets (#2943)

  • Added BillSum datasets

  • fixed billsumca

  • Updated BillSumCA description

  • Updated BillSumUS description

  • Update mteb/tasks/Retrieval/eng/BillSumCA.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Update mteb/tasks/Retrieval/eng/BillSumUS.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • lint

  • lint


Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (007d19f)

  • dataset: add GovReport dataset (#2953)

  • Added govreport task

  • Updated description (42dfe0d)

  • Update tasks & benchmarks tables (da46c8e)

  • dataset: Add BSARD v2, fixing the data loading issues of v1 (#2935)

  • BSARD loader fixed

  • BSARDv2 metadata fixed

  • Update mteb/tasks/Retrieval/fra/BSARDRetrieval.py


Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (8416541)

1.38.38

25 Jul 10:36
Compare
Choose a tag to compare

1.38.38 (2025-07-25)

Ci

  • ci: bump semantic release (4ef8571)

Documentation

  • docs: Update adding_a_dataset.md (#2947)

  • docs: Update adding_a_dataset.md

  • Update docs/adding_a_dataset.md (a78debf)

Fix

  • fix: Prevent incorrectly passing "selector_state" to get_benchmark (#2939)

The leaderboard would have (silent) errors where get_benchmark lead to a KeyError due to "selector_state" being passed as a default value. Setting DEFAULT_BENCMARK_NAME as the value solves this issue. (8496ec2)

  • fix: Only import SparseEncoder once sentence-transformer version have been checked (#2940)

  • fix: Only import SparseEncoder once sentence-transformer version have been checked

fixes #2936

  • Update mteb/models/opensearch_neural_sparse_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>


Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (79a43af)

Unknown

  • Update the link for gemini-embedding-001 (#2928) (533ce59)

1.38.37

21 Jul 11:43
Compare
Choose a tag to compare

1.38.37 (2025-07-21)

Fix

  • fix: specify revision for opensearch (#2919)

specify revision for opensearch (0ac0231)

Unknown

  • Use mteb.get_model in adding_a_dataset.md (#2922)

Update adding_a_dataset.md (c1922c8)

  • dataset: add BarExamQA dataset (#2916)

  • Add BareExamQA retrieval task

  • ran linter

  • updated details

  • updated details

  • fixed subtype name

  • fixed changes

  • ran linter again (1dcc6dc)

  • model: Add OpenSearch inf-free sparse encoding models (#2903)

add opensearch inf-free models

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (5a868e3)

1.38.36

20 Jul 18:00
Compare
Choose a tag to compare

1.38.36 (2025-07-20)

Fix

  • fix: change passage prompt to document (#2912)

  • change document to passage

  • fix prompt names

  • fix kwargs check

  • fix default prompt (a298fa9)

Unknown

  • Update tasks & benchmarks tables (372fc4c)

  • dataset: Add JapaneseSentimentClassification (#2913)

Add JapaneseSentimentClassification (57438c2)

  • Update tasks & benchmarks tables (56c98ed)

  • Classification dataset cleaning (#2900)

  • Classification dataset cleaning

  • Update pull request number

  • Fix metadata test

  • fix formatting

  • add script for cleaning (aef1e33)

  • Evaluator tests (#2910)

  • Adding Classification Evaluator test

  • Modifications due to the comments

  • Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Modifications due to the comments

  • Modifications due to the comments

  • Adding STSEvaluator and SummarizationEvaluator tests

  • Correcting due to the comments

  • Correcting due to the comments


Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (c7078af)

1.38.35

16 Jul 20:16
Compare
Choose a tag to compare

1.38.35 (2025-07-16)

Fix

  • fix: update colpali engine models (#2905)

  • adding vidore benchmarks

  • fix typo

  • clean vidore names + per lang eval

  • lint

  • vidore names

  • bibtex fix

  • fix revision

  • vidore v2 citation

  • update citation format and fix per-language mappings

  • lint: citations

  • typo citations

  • fix revisiions

  • lint

  • fix colnomic3b revision

  • fix colqwen2.5 revision + latest repo version

  • fix query agmentation tokens

  • colsmol revision (9864e2a)

Unknown

  • Add Classification Evaluator unit test (#2838)

  • Adding Classification Evaluator test

  • Modifications due to the comments

  • Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

  • Modifications due to the comments

  • Modifications due to the comments


Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (4a47f90)

  • model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) (#2889)

  • feat: add KaLM_Embedding_X_0605 in kalm_models

  • Update kalm_models.py for lint format

  • kalm-emb-v2

  • kalm-emb-v2

  • kalm-emb-v2

  • kalm-emb-v2

  • kalm-emb-v2


Co-authored-by: xinshuohu <xinshuohu@tencent.com>
Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> (9ecac21)

  • model: add image support for jina embeddings v4 (#2893)

  • feat: unify text and image embeddings for all tasks

  • fix: uniform batch size

  • fix: update error message

  • fix: update code task

  • fix: update max length

  • fix: apply review suggestions (17be7e5)

1.38.34

10 Jul 12:29
Compare
Choose a tag to compare

1.38.34 (2025-07-10)

Fix

  • fix: pin datasets version (#2892)

fix datasets version (00c95cf)

Unknown

  • Update tasks & benchmarks tables (5303fec)

  • dataset: Evalita dataset integration (#2859)

  • Added DadoEvalCoarseClassification

  • Removed unnecessary columns from DadoEvalCoarseClassification

  • Added EmitClassification task

  • added SardiStanceClassification task

  • Added GeoLingItClassification task

  • Added DisCoTexPairClassification tasks

  • Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits

  • changed import in DisCoTexPairClassification

  • removed GeoLingItClassification dataset

  • fixed citation formatting, missing metadata parameters and lint formatting

    • Added XGlueWRPReranking task
  • Added missing init.py files
  • fixed metadata in XGlueWRPReranking

  • Added MKQARetrieval task

  • fixed type in XGlueWRPReranking

  • changed MKQARetrieval from cross-lingual to monolingual

  • formatted MKQARetrieval file

  • removed unused const


Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> (ee17a6e)

  • model: add Hakim and TookaSBERTV2 models (#2826)

  • add tooka v2s

  • add mcinext models

  • update mcinext.py

  • Apply PR review suggestions

  • Update mteb/models/mcinext_models.py


Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (04dc6d4)

  • Update tasks & benchmarks tables (5be02c1)

  • Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)

  • Add JaCWIR and JQaRA for reranking

  • Fix ANLP Journal datasets

  • Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval

  • tackle test cases

  • Remove _evaluate_subset usage

  • Separate v1 and v2

  • Update info for NLP Journal datasets (70768b5)

  • Comment kalm model (#2877)

comment kalm model (a3ca95c)

  • model: add kalm_models ModelMeta (new PR) (#2853)

  • feat: add KaLM_Embedding_X_0605 in kalm_models

  • Update kalm_models.py for lint format


Co-authored-by: xinshuohu <xinshuohu@tencent.com> (b67bd04)

  • model: add listconranker modelmeta (#2874)

  • add listconranker modelmeta

  • fix bugs

  • use linter

  • lint


Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (5846f56)

  • fix tests to be compatible with SentenceTransformers v5 (#2875)

  • fix sbert v5

  • add comment (f346a37)

  • rename seed-1.6-embedding to seed1.6-embedding (#2870) (f27648b)

  • model: Adding nvidia/llama-nemoretriever-colembed models (#2861)

  • nvidia_llama_nemoretriever_colembed

  • correct 3b reference

  • lint fix

  • add training data and license for nvidia/llama_nemoretriever_colembed

  • lint


Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (4ff1413)

  • Bump gradio to fix leaderboard sorting (#2866)

Bump gradio (a4388c2)

  • model: Adding Sailesh97/Hinvec (#2842)

  • Adding Hinvec Model's Meta data.

  • Adding hinvec_model.py

  • Update mteb/models/hinvec_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

  • formated code with Black and lint with Ruff

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (e3286d5)