1.38.43 (2025-08-20)

Ci

ci: Temporarily limit pytrec version to "pytrec-eval-terrier>=0.5.6, <0.5.8" to prevent errors

try to fix CI (6fa6efa)

Fix

fix: Add VN-MTEB benchmark and Leaderboard (#2995)
[ADD] 50 vietnamese dataset from vn-mteb
[UPDATE] task metadata
[UPDATE] import dependencies
[UPDATE] task metadata, bibtext citation
[UPDATE-TEST] test_model_meta
[UPDATE] sample_creation to machine-translated and LM verified
[ADD] sample creation machine-translated and LM verified
[ADD] VN-MTEB benchmark and leaderboard
[FIX] wrong benchmark name
[REMOVE] default fields metadata in Classfication tasks (0a6e855)

Unknown

Update tasks & benchmarks tables (def1377)
fix MBPPRetrieval revision (#3055)

Update MBPPRetrieval.py

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (ea801ec)

Update tasks & benchmarks tables (7da3cf9)
dataset: Added wikisql retrieval (#3039)
Add WikiSQL retrieval task

Code retrieval task based on WikiSQL natural language to SQL dataset
Natural language questions matched to SQL query implementations
Uses sql-Code evaluation language for SQL-specific metrics
Includes proper citations and descriptive statistics

Add WikiSQLRetrieval to imports
Add descriptive statistics for WikiSQLRetrieval
Reformatting
Reformatting
Reformatting, correcting the revision (7b289f5)
Update tasks & benchmarks tables (1fff5ce)
dataset: Add mbpp retrieval (#3037)
Add MBPP retrieval task

Code retrieval task based on 378 Python programming problems
Natural language queries matched to Python code implementations
Uses python-Code evaluation language for code-specific metrics
Includes proper citations and descriptive statistics

Add MBPPRetrieval to imports
Add descriptive statistics for MBPPRetrieval
Reformatting
Reformatting (ac69263)
Fix 3 VN-MTEB Pair Classification tasks (#3053)
[ADD] 50 vietnamese dataset from vn-mteb
[UPDATE] task metadata
[UPDATE] import dependencies
[UPDATE] task metadata, bibtext citation
[UPDATE-TEST] test_model_meta
[UPDATE] sample_creation to machine-translated and LM verified
[ADD] sample creation machine-translated and LM verified
[ADD] Vietnamese Embedding models
[REMOVE] default fields metadata in Classfication tasks
[UPDATE] model to vi-vn language specific file
[FIX] lint
[FIX] model loader
[FIX] VN-MTEB 3 datasets PairClassification rename column (4e3fcd8)

1.38.42 (2025-08-18)

Ci

ci: Updating rerun delays to prevent false positives errors (e476dc3)
ci: reduce parallel runs for when checking if a dataset exists (#3035)

The hope is that this will prevent many of the current errors (4aaf47e)

Fix

fix: Updated revision for jina-embeddings-v4 (#3046)
fix: jinav4 revision

Signed-off-by: admin <bo.wang@jina.ai>

change revision instead of removing it

Signed-off-by: admin <bo.wang@jina.ai>

Signed-off-by: admin <bo.wang@jina.ai>
Co-authored-by: admin <bo.wang@jina.ai> (c58b319)

Unknown

model: add granite-embedding-english R2 models (#3050) (e08ec56)
model: Add GreenNode Vietnamese Embedding models (#2994)
[ADD] 50 vietnamese dataset from vn-mteb
[UPDATE] task metadata
[UPDATE] import dependencies
[UPDATE] task metadata, bibtext citation
[UPDATE-TEST] test_model_meta
[UPDATE] sample_creation to machine-translated and LM verified
[ADD] sample creation machine-translated and LM verified
[ADD] Vietnamese Embedding models
[REMOVE] default fields metadata in Classfication tasks
[UPDATE] model to vi-vn language specific file
[FIX] lint
[FIX] model loader (72f7b05)
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (d729d32)

1.38.41 (2025-08-17)

Fix

fix: incorrect revision for SNLRetrieval (#3033)

The provided revisions doesn't seem to be present on:
adrlau/navjordj-SNL_summarization_copy

Replacing with latest revision (5c65913)

Unknown

Update tasks & benchmarks tables (a96f2e4)
dataset: Add HumanEvalRetrieval task (#3022)
Add HumanEvalRetrieval dataset
Fix TaskMetadata structure and remove descriptive_stats

Use TaskMetadata class instead of dict
Remove descriptive_stats as requested in PR review
Add date field and proper import structure

Fix dataset path and use verified metadata

Change path from zeroshot/humaneval-embedding-benchmark to embedding-benchmark/HumanEval
Use actual description from HuggingFace dataset page
Remove fabricated citation and reference
Remove revision field that was incorrect
Reference HuggingFace dataset page instead of arxiv

Add correct revision hash to HumanEval

Add revision hash: ed1f48a for reproducibility

Fix HumanEval metadata validation

Add date field for metadata completeness
Add bibtex_citation field (empty string)
Required for TaskMetadata validation to pass
Should resolve PR test failure

Address reviewer feedback

Remove trust_remote_code parameter as requested
Add revision parameter to load_dataset() calls for consistency
Use metadata revision hash in dataset loading for reproducibility

Fix field names in HumanEval dataset loading

Changed query_id/corpus_id to query-id/corpus-id to match actual dataset format.

Fix deprecated metadata_dict usage

Use self.metadata.dataset instead of self.metadata_dict for v2.0 compatibility.

Fix data structure for MTEB compatibility

Organize data by splits as expected by MTEB retrieval tasks
Convert scores to integers for pytrec_eval compatibility

Address PR feedback for HumanEval dataset

Add descriptive statistics using calculate_metadata_metrics()
Enhance metadata description with dataset structure details
Add complete BibTeX citation for original paper
Update to full commit hash revision
Add python-Code language tag for programming language
Explain retrieval task formulation clearly

Fix BibTeX citation formatting for HumanEvalRetrieval

Update citation to match bibtexparser formatting requirements
Fields now in alphabetical order with lowercase names
Proper trailing commas and indentation (d4e6223)

model: Add granite-vision-embedding model (#3029)
Add files via upload
Address review comments
Address review comments
ruff format
Update mteb/models/granite_vision_embedding_models.py
lint error fix

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (37d115a)

model: Add samilpwc_models meta (#3028)
model: Add samilpwc_models meta
Fix: Remove CONST
Fix: Reformat File
Update: model revision (96a7cc5)

1.38.40 (2025-08-16)

Fix

fix: Add missing training sets for qzhou (#3023)
Supplement missing training sets
reformat code
Reorganize the data list format
update qzhou_model meta (20bc80c)

Unknown

Update tasks & benchmarks tables (177997f)
Standardise task names and fix citation formatting (#3026)

fixes for name formatting (ea41e7a)

Add OpenAI models with 512 dimension (#3008)
Add OpenAI/text-embedding-3-small (512 dim)
Add OpenAI/text-embedding-3-large (512 dim)
Correcting due to comments

Co-authored-by: fzowl <zoltan@voyageai.com> (d8b2910)

model: Add Cohere embed-v4.0 model support (#3006)
Add Cohere embed-v4.0 model support

Add text-only embed-v4.0 model in cohere_models.py
Add multimodal embed-v4.0 model in cohere_v.py
Support configurable dimensions (256, 512, 1024, 1536)
Support 128,000 token context length
Support multimodal embedding (text, images, mixed PDFs)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

Add Cohere embed-v4.0 model support

Update cohere_v.py and cohere_models.py to include the new embed-v4.0 model with proper configuration and integration.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com> (87eb27c)

Update tasks & benchmarks tables (4adf565)
dataset: Added 50 Vietnamese dataset from vn-mteb (#2964)
[ADD] 50 vietnamese dataset from vn-mteb
[UPDATE] task metadata
[UPDATE] import dependencies
[UPDATE] task metadata, bibtext citation
[UPDATE-TEST] test_model_meta
[UPDATE] sample_creation to machine-translated and LM verified
[ADD] sample creation machine-translated and LM verified
[REMOVE] default fields metadata in Classfication tasks (741b022)
lint: Correcting lint errors (#3004)
Adding Classification Evaluator test
Modifications due to the comments
Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Modifications due to the comments
Modifications due to the comments
Correcting the lint errors

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (01840ce)

model: BAAI/bge-m3-unsupervised Model (#3007)
Add BAAI/bge-m3-unsupervised Model
(BAAI/bge_m3_retromae is commented out - the details are proper, but it fails during loading the model for me, so i commented out)
Remove the commented retromae model

Co-authored-by: fzowl <zoltan@voyageai.com> (042db73)

model: Add Voyage 3.5 model configuration (#3005)

Add Voyage 3.5 model configuration

Add voyage_3_5 ModelMeta with 1024 embed dimensions and 32000 max tokens
Set release date to 2025-01-21 with revision 1
Configure for cosine similarity with instruction support
Include standard Voyage training datasets reference

🤖 Generated with Claude Code

Co-authored-by: Claude <noreply@anthropic.com> (e5d386b)

qzhou-embedding model_meta & implementation (#2975)
qzhou-embedding model_meta & implementation
Update qzhou_models.py
Update qzhou_models.py

Processing todo items（Add default instruction）

Update qzhou_models.py

correct bge datalist

Update qzhou_models.py

correct 'public_training_data'

Update qzhou_models.py
Update qzhou_models.py
Update qzhou_models.py
Update qzhou_models.py
Update mteb/models/qzhou_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Update mteb/models/qzhou_models.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

format qzhou_models.py for ruff check

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (6c1f1c6)

1.38.39 (2025-08-03)

Fix

fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression (#2716)
Add RuSciBench
fix bitext mining lang
Add regression task
fix init
add missing files
Improve description
Add superseded_by
fix lint
Update regression task to match with v2
Add stratified_subsampling for regression task
Add boostrap for regression task
Rename task class, add model as evaluator argument
fix import
fix import 2
fixes
fix
Rename regression model protocol (36df9ca)

Unknown

Update tasks & benchmarks tables (a86e2dd)
Update tasks & benchmarks tables (e4f30e9)
dataset: add BillSum datasets (#2943)
Added BillSum datasets
fixed billsumca
Updated BillSumCA description
Updated BillSumUS description
Update mteb/tasks/Retrieval/eng/BillSumCA.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Update mteb/tasks/Retrieval/eng/BillSumUS.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

lint
lint

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (007d19f)

dataset: add GovReport dataset (#2953)
Added govreport task
Updated description (42dfe0d)
Update tasks & benchmarks tables (da46c8e)
dataset: Add BSARD v2, fixing the data loading issues of v1 (#2935)
BSARD loader fixed
BSARDv2 metadata fixed
Update mteb/tasks/Retrieval/fra/BSARDRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (8416541)

1.38.38 (2025-07-25)

Ci

ci: bump semantic release (4ef8571)

Documentation

docs: Update adding_a_dataset.md (#2947)
docs: Update adding_a_dataset.md
Update docs/adding_a_dataset.md (a78debf)

Fix

fix: Prevent incorrectly passing "selector_state" to get_benchmark (#2939)

The leaderboard would have (silent) errors where get_benchmark lead to a KeyError due to "selector_state" being passed as a default value. Setting DEFAULT_BENCMARK_NAME as the value solves this issue. (8496ec2)

fix: Only import SparseEncoder once sentence-transformer version have been checked (#2940)
fix: Only import SparseEncoder once sentence-transformer version have been checked

fixes #2936

Update mteb/models/opensearch_neural_sparse_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (79a43af)

fix: replace with passage (#2934) (5ed6c90)

Unknown

Update the link for gemini-embedding-001 (#2928) (533ce59)

1.38.37 (2025-07-21)

Fix

fix: specify revision for opensearch (#2919)

specify revision for opensearch (0ac0231)

Unknown

Use mteb.get_model in adding_a_dataset.md (#2922)

Update adding_a_dataset.md (c1922c8)

dataset: add BarExamQA dataset (#2916)
Add BareExamQA retrieval task
ran linter
updated details
updated details
fixed subtype name
fixed changes
ran linter again (1dcc6dc)
model: Add OpenSearch inf-free sparse encoding models (#2903)

add opensearch inf-free models

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (5a868e3)

1.38.36 (2025-07-20)

Fix

fix: change passage prompt to document (#2912)
change document to passage
fix prompt names
fix kwargs check
fix default prompt (a298fa9)

Unknown

Update tasks & benchmarks tables (372fc4c)
dataset: Add JapaneseSentimentClassification (#2913)

Add JapaneseSentimentClassification (57438c2)

Update tasks & benchmarks tables (56c98ed)
Classification dataset cleaning (#2900)
Classification dataset cleaning
Update pull request number
Fix metadata test
fix formatting
add script for cleaning (aef1e33)
Evaluator tests (#2910)
Adding Classification Evaluator test
Modifications due to the comments
Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Modifications due to the comments
Modifications due to the comments
Adding STSEvaluator and SummarizationEvaluator tests
Correcting due to the comments
Correcting due to the comments

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (c7078af)

1.38.35 (2025-07-16)

Fix

fix: update colpali engine models (#2905)
adding vidore benchmarks
fix typo
clean vidore names + per lang eval
lint
vidore names
bibtex fix
fix revision
vidore v2 citation
update citation format and fix per-language mappings
lint: citations
typo citations
fix revisiions
lint
fix colnomic3b revision
fix colqwen2.5 revision + latest repo version
fix query agmentation tokens
colsmol revision (9864e2a)

Unknown

Add Classification Evaluator unit test (#2838)
Adding Classification Evaluator test
Modifications due to the comments
Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Update tests/test_evaluators/test_ClassificationEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Modifications due to the comments
Modifications due to the comments

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (4a47f90)

model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) (#2889)
feat: add KaLM_Embedding_X_0605 in kalm_models
Update kalm_models.py for lint format
kalm-emb-v2
kalm-emb-v2
kalm-emb-v2
kalm-emb-v2
kalm-emb-v2

Co-authored-by: xinshuohu <xinshuohu@tencent.com>
Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> (9ecac21)

model: add image support for jina embeddings v4 (#2893)
feat: unify text and image embeddings for all tasks
fix: uniform batch size
fix: update error message
fix: update code task
fix: update max length
fix: apply review suggestions (17be7e5)

1.38.34 (2025-07-10)

Fix

fix: pin datasets version (#2892)

fix datasets version (00c95cf)

Unknown

Update tasks & benchmarks tables (5303fec)
dataset: Evalita dataset integration (#2859)
Added DadoEvalCoarseClassification
Removed unnecessary columns from DadoEvalCoarseClassification
Added EmitClassification task
added SardiStanceClassification task
Added GeoLingItClassification task
Added DisCoTexPairClassification tasks
Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits
changed import in DisCoTexPairClassification
removed GeoLingItClassification dataset
fixed citation formatting, missing metadata parameters and lint formatting
- Added XGlueWRPReranking task

Added missing init.py files

fixed metadata in XGlueWRPReranking
Added MKQARetrieval task
fixed type in XGlueWRPReranking
changed MKQARetrieval from cross-lingual to monolingual
formatted MKQARetrieval file
removed unused const

Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> (ee17a6e)

model: add Hakim and TookaSBERTV2 models (#2826)
add tooka v2s
add mcinext models
update mcinext.py
Apply PR review suggestions
Update mteb/models/mcinext_models.py

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (04dc6d4)

Update tasks & benchmarks tables (5be02c1)
Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)
Add JaCWIR and JQaRA for reranking
Fix ANLP Journal datasets
Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval
tackle test cases
Remove _evaluate_subset usage
Separate v1 and v2
Update info for NLP Journal datasets (70768b5)
Comment kalm model (#2877)

comment kalm model (a3ca95c)

model: add kalm_models ModelMeta (new PR) (#2853)
feat: add KaLM_Embedding_X_0605 in kalm_models
Update kalm_models.py for lint format

Co-authored-by: xinshuohu <xinshuohu@tencent.com> (b67bd04)

model: add listconranker modelmeta (#2874)
add listconranker modelmeta
fix bugs
use linter
lint

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (5846f56)

fix tests to be compatible with SentenceTransformers v5 (#2875)
fix sbert v5
add comment (f346a37)
rename seed-1.6-embedding to seed1.6-embedding (#2870) (f27648b)
model: Adding nvidia/llama-nemoretriever-colembed models (#2861)
nvidia_llama_nemoretriever_colembed
correct 3b reference
lint fix
add training data and license for nvidia/llama_nemoretriever_colembed
lint

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (4ff1413)

Bump gradio to fix leaderboard sorting (#2866)

Bump gradio (a4388c2)

model: Adding Sailesh97/Hinvec (#2842)
Adding Hinvec Model's Meta data.
Adding hinvec_model.py
Update mteb/models/hinvec_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

formated code with Black and lint with Ruff

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (e3286d5)

Releases: embeddings-benchmark/mteb

1.38.43

1.38.43 (2025-08-20)

Ci

Fix

Unknown

Uh oh!

1.38.42

1.38.42 (2025-08-18)

Ci

Fix

Unknown

Uh oh!

1.38.41

1.38.41 (2025-08-17)

Fix

Unknown

Uh oh!

1.38.40

1.38.40 (2025-08-16)

Fix

Unknown

Uh oh!

1.38.39

1.38.39 (2025-08-03)

Fix

Unknown

Uh oh!

1.38.38

1.38.38 (2025-07-25)

Ci

Documentation

Fix

Unknown

Uh oh!

1.38.37

1.38.37 (2025-07-21)

Fix

Unknown

Uh oh!

1.38.36

1.38.36 (2025-07-20)

Fix

Unknown

Uh oh!

1.38.35

1.38.35 (2025-07-16)

Fix

Unknown

Uh oh!

1.38.34

1.38.34 (2025-07-10)

Fix

Unknown

Uh oh!