Releases: embeddings-benchmark/mteb
1.38.43
1.38.43 (2025-08-20)
Ci
- ci: Temporarily limit pytrec version to "pytrec-eval-terrier>=0.5.6, <0.5.8" to prevent errors
try to fix CI (6fa6efa
)
Fix
-
fix: Add VN-MTEB benchmark and Leaderboard (#2995)
-
[ADD] 50 vietnamese dataset from vn-mteb
-
[UPDATE] task metadata
-
[UPDATE] import dependencies
-
[UPDATE] task metadata, bibtext citation
-
[UPDATE-TEST] test_model_meta
-
[UPDATE] sample_creation to machine-translated and LM verified
-
[ADD] sample creation machine-translated and LM verified
-
[ADD] VN-MTEB benchmark and leaderboard
-
[FIX] wrong benchmark name
-
[REMOVE] default fields metadata in Classfication tasks (
0a6e855
)
Unknown
Update MBPPRetrieval.py
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (ea801ec
)
-
Update tasks & benchmarks tables (
7da3cf9
) -
dataset: Added wikisql retrieval (#3039)
-
Add WikiSQL retrieval task
- Code retrieval task based on WikiSQL natural language to SQL dataset
- Natural language questions matched to SQL query implementations
- Uses sql-Code evaluation language for SQL-specific metrics
- Includes proper citations and descriptive statistics
-
Add WikiSQLRetrieval to imports
-
Add descriptive statistics for WikiSQLRetrieval
-
Reformatting
-
Reformatting
-
Reformatting, correcting the revision (
7b289f5
) -
Update tasks & benchmarks tables (
1fff5ce
) -
dataset: Add mbpp retrieval (#3037)
-
Add MBPP retrieval task
- Code retrieval task based on 378 Python programming problems
- Natural language queries matched to Python code implementations
- Uses python-Code evaluation language for code-specific metrics
- Includes proper citations and descriptive statistics
-
Add MBPPRetrieval to imports
-
Add descriptive statistics for MBPPRetrieval
-
Reformatting
-
Reformatting (
ac69263
) -
Fix 3 VN-MTEB Pair Classification tasks (#3053)
-
[ADD] 50 vietnamese dataset from vn-mteb
-
[UPDATE] task metadata
-
[UPDATE] import dependencies
-
[UPDATE] task metadata, bibtext citation
-
[UPDATE-TEST] test_model_meta
-
[UPDATE] sample_creation to machine-translated and LM verified
-
[ADD] sample creation machine-translated and LM verified
-
[ADD] Vietnamese Embedding models
-
[REMOVE] default fields metadata in Classfication tasks
-
[UPDATE] model to vi-vn language specific file
-
[FIX] lint
-
[FIX] model loader
-
[FIX] VN-MTEB 3 datasets PairClassification rename column (
4e3fcd8
)
1.38.42
1.38.42 (2025-08-18)
Ci
-
ci: Updating rerun delays to prevent false positives errors (
e476dc3
) -
ci: reduce parallel runs for when checking if a dataset exists (#3035)
The hope is that this will prevent many of the current errors (4aaf47e
)
Fix
-
fix: Updated revision for jina-embeddings-v4 (#3046)
-
fix: jinav4 revision
Signed-off-by: admin <bo.wang@jina.ai>
- change revision instead of removing it
Signed-off-by: admin <bo.wang@jina.ai>
Signed-off-by: admin <bo.wang@jina.ai>
Co-authored-by: admin <bo.wang@jina.ai> (c58b319
)
Unknown
-
model: add granite-embedding-english R2 models (#3050) (
e08ec56
) -
model: Add GreenNode Vietnamese Embedding models (#2994)
-
[ADD] 50 vietnamese dataset from vn-mteb
-
[UPDATE] task metadata
-
[UPDATE] import dependencies
-
[UPDATE] task metadata, bibtext citation
-
[UPDATE-TEST] test_model_meta
-
[UPDATE] sample_creation to machine-translated and LM verified
-
[ADD] sample creation machine-translated and LM verified
-
[ADD] Vietnamese Embedding models
-
[REMOVE] default fields metadata in Classfication tasks
-
[UPDATE] model to vi-vn language specific file
-
[FIX] lint
-
[FIX] model loader (
72f7b05
) -
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (
d729d32
)
1.38.41
1.38.41 (2025-08-17)
Fix
- fix: incorrect revision for SNLRetrieval (#3033)
The provided revisions doesn't seem to be present on:
adrlau/navjordj-SNL_summarization_copy
Replacing with latest revision (5c65913
)
Unknown
-
Update tasks & benchmarks tables (
a96f2e4
) -
dataset: Add HumanEvalRetrieval task (#3022)
-
Add HumanEvalRetrieval dataset
-
Fix TaskMetadata structure and remove descriptive_stats
- Use TaskMetadata class instead of dict
- Remove descriptive_stats as requested in PR review
- Add date field and proper import structure
- Fix dataset path and use verified metadata
- Change path from zeroshot/humaneval-embedding-benchmark to embedding-benchmark/HumanEval
- Use actual description from HuggingFace dataset page
- Remove fabricated citation and reference
- Remove revision field that was incorrect
- Reference HuggingFace dataset page instead of arxiv
- Add correct revision hash to HumanEval
- Add revision hash: ed1f48a for reproducibility
- Fix HumanEval metadata validation
- Add date field for metadata completeness
- Add bibtex_citation field (empty string)
- Required for TaskMetadata validation to pass
- Should resolve PR test failure
- Address reviewer feedback
- Remove trust_remote_code parameter as requested
- Add revision parameter to load_dataset() calls for consistency
- Use metadata revision hash in dataset loading for reproducibility
- Fix field names in HumanEval dataset loading
Changed query_id/corpus_id to query-id/corpus-id to match actual dataset format.
- Fix deprecated metadata_dict usage
Use self.metadata.dataset instead of self.metadata_dict for v2.0 compatibility.
- Fix data structure for MTEB compatibility
- Organize data by splits as expected by MTEB retrieval tasks
- Convert scores to integers for pytrec_eval compatibility
- Address PR feedback for HumanEval dataset
- Add descriptive statistics using calculate_metadata_metrics()
- Enhance metadata description with dataset structure details
- Add complete BibTeX citation for original paper
- Update to full commit hash revision
- Add python-Code language tag for programming language
- Explain retrieval task formulation clearly
- Fix BibTeX citation formatting for HumanEvalRetrieval
- Update citation to match bibtexparser formatting requirements
- Fields now in alphabetical order with lowercase names
- Proper trailing commas and indentation (
d4e6223
)
-
model: Add granite-vision-embedding model (#3029)
-
Add files via upload
-
Address review comments
-
Address review comments
-
ruff format
-
Update mteb/models/granite_vision_embedding_models.py
-
lint error fix
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (37d115a
)
1.38.40
1.38.40 (2025-08-16)
Fix
-
fix: Add missing training sets for qzhou (#3023)
-
Supplement missing training sets
-
reformat code
-
Reorganize the data list format
-
update qzhou_model meta (
20bc80c
)
Unknown
-
Update tasks & benchmarks tables (
177997f
) -
Standardise task names and fix citation formatting (#3026)
fixes for name formatting (ea41e7a
)
-
Add OpenAI models with 512 dimension (#3008)
-
Add OpenAI/text-embedding-3-small (512 dim)
Add OpenAI/text-embedding-3-large (512 dim) -
Correcting due to comments
Co-authored-by: fzowl <zoltan@voyageai.com> (d8b2910
)
-
model: Add Cohere embed-v4.0 model support (#3006)
-
Add Cohere embed-v4.0 model support
- Add text-only embed-v4.0 model in cohere_models.py
- Add multimodal embed-v4.0 model in cohere_v.py
- Support configurable dimensions (256, 512, 1024, 1536)
- Support 128,000 token context length
- Support multimodal embedding (text, images, mixed PDFs)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add Cohere embed-v4.0 model support
Update cohere_v.py and cohere_models.py to include the new embed-v4.0 model with proper configuration and integration.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com> (87eb27c
)
-
Update tasks & benchmarks tables (
4adf565
) -
dataset: Added 50 Vietnamese dataset from vn-mteb (#2964)
-
[ADD] 50 vietnamese dataset from vn-mteb
-
[UPDATE] task metadata
-
[UPDATE] import dependencies
-
[UPDATE] task metadata, bibtext citation
-
[UPDATE-TEST] test_model_meta
-
[UPDATE] sample_creation to machine-translated and LM verified
-
[ADD] sample creation machine-translated and LM verified
-
[REMOVE] default fields metadata in Classfication tasks (
741b022
) -
lint: Correcting lint errors (#3004)
-
Adding Classification Evaluator test
-
Modifications due to the comments
-
Update tests/test_evaluators/test_ClassificationEvaluator.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
- Update tests/test_evaluators/test_ClassificationEvaluator.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
-
Modifications due to the comments
-
Modifications due to the comments
-
Correcting the lint errors
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (01840ce
)
-
model: BAAI/bge-m3-unsupervised Model (#3007)
-
Add BAAI/bge-m3-unsupervised Model
(BAAI/bge_m3_retromae is commented out - the details are proper, but it fails during loading the model for me, so i commented out) -
Remove the commented retromae model
Co-authored-by: fzowl <zoltan@voyageai.com> (042db73
)
- model: Add Voyage 3.5 model configuration (#3005)
Add Voyage 3.5 model configuration
- Add voyage_3_5 ModelMeta with 1024 embed dimensions and 32000 max tokens
- Set release date to 2025-01-21 with revision 1
- Configure for cosine similarity with instruction support
- Include standard Voyage training datasets reference
🤖 Generated with Claude Code
Co-authored-by: Claude <noreply@anthropic.com> (e5d386b
)
-
qzhou-embedding model_meta & implementation (#2975)
-
qzhou-embedding model_meta & implementation
-
Update qzhou_models.py
-
Update qzhou_models.py
Processing todo items(Add default instruction)
- Update qzhou_models.py
correct bge datalist
- Update qzhou_models.py
correct 'public_training_data'
-
Update qzhou_models.py
-
Update qzhou_models.py
-
Update qzhou_models.py
-
Update qzhou_models.py
-
Update mteb/models/qzhou_models.py
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
- Update mteb/models/qzhou_models.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
- format qzhou_models.py for ruff check
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (6c1f1c6
)
1.38.39
1.38.39 (2025-08-03)
Fix
-
fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression (#2716)
-
Add RuSciBench
-
fix bitext mining lang
-
Add regression task
-
fix init
-
add missing files
-
Improve description
-
Add superseded_by
-
fix lint
-
Update regression task to match with v2
-
Add stratified_subsampling for regression task
-
Add boostrap for regression task
-
Rename task class, add model as evaluator argument
-
fix import
-
fix import 2
-
fixes
-
fix
-
Rename regression model protocol (
36df9ca
)
Unknown
-
Update tasks & benchmarks tables (
a86e2dd
) -
Update tasks & benchmarks tables (
e4f30e9
) -
dataset: add BillSum datasets (#2943)
-
Added BillSum datasets
-
fixed billsumca
-
Updated BillSumCA description
-
Updated BillSumUS description
-
Update mteb/tasks/Retrieval/eng/BillSumCA.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
- Update mteb/tasks/Retrieval/eng/BillSumUS.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
-
lint
-
lint
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (007d19f
)
-
dataset: add GovReport dataset (#2953)
-
Added govreport task
-
Updated description (
42dfe0d
) -
Update tasks & benchmarks tables (
da46c8e
) -
dataset: Add BSARD v2, fixing the data loading issues of v1 (#2935)
-
BSARD loader fixed
-
BSARDv2 metadata fixed
-
Update mteb/tasks/Retrieval/fra/BSARDRetrieval.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (8416541
)
1.38.38
1.38.38 (2025-07-25)
Ci
- ci: bump semantic release (
4ef8571
)
Documentation
-
docs: Update adding_a_dataset.md (#2947)
-
docs: Update adding_a_dataset.md
-
Update docs/adding_a_dataset.md (
a78debf
)
Fix
- fix: Prevent incorrectly passing "selector_state" to
get_benchmark
(#2939)
The leaderboard would have (silent) errors where get_benchmark
lead to a KeyError due to "selector_state" being passed as a default value. Setting DEFAULT_BENCMARK_NAME
as the value solves this issue. (8496ec2
)
-
fix: Only import SparseEncoder once sentence-transformer version have been checked (#2940)
-
fix: Only import SparseEncoder once sentence-transformer version have been checked
fixes #2936
- Update mteb/models/opensearch_neural_sparse_models.py
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (79a43af
)
Unknown
1.38.37
1.38.37 (2025-07-21)
Fix
- fix: specify revision for opensearch (#2919)
specify revision for opensearch (0ac0231
)
Unknown
- Use
mteb.get_model
in adding_a_dataset.md (#2922)
Update adding_a_dataset.md (c1922c8
)
-
dataset: add BarExamQA dataset (#2916)
-
Add BareExamQA retrieval task
-
ran linter
-
updated details
-
updated details
-
fixed subtype name
-
fixed changes
-
ran linter again (
1dcc6dc
) -
model: Add OpenSearch inf-free sparse encoding models (#2903)
add opensearch inf-free models
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (5a868e3
)
1.38.36
1.38.36 (2025-07-20)
Fix
-
fix: change
passage
prompt todocument
(#2912) -
change document to passage
-
fix prompt names
-
fix kwargs check
-
fix default prompt (
a298fa9
)
Unknown
Add JapaneseSentimentClassification (57438c2
)
-
Update tasks & benchmarks tables (
56c98ed
) -
Classification dataset cleaning (#2900)
-
Classification dataset cleaning
-
Update pull request number
-
Fix metadata test
-
fix formatting
-
add script for cleaning (
aef1e33
) -
Evaluator tests (#2910)
-
Adding Classification Evaluator test
-
Modifications due to the comments
-
Update tests/test_evaluators/test_ClassificationEvaluator.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
- Update tests/test_evaluators/test_ClassificationEvaluator.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
-
Modifications due to the comments
-
Modifications due to the comments
-
Adding STSEvaluator and SummarizationEvaluator tests
-
Correcting due to the comments
-
Correcting due to the comments
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (c7078af
)
1.38.35
1.38.35 (2025-07-16)
Fix
-
fix: update colpali engine models (#2905)
-
adding vidore benchmarks
-
fix typo
-
clean vidore names + per lang eval
-
lint
-
vidore names
-
bibtex fix
-
fix revision
-
vidore v2 citation
-
update citation format and fix per-language mappings
-
lint: citations
-
typo citations
-
fix revisiions
-
lint
-
fix colnomic3b revision
-
fix colqwen2.5 revision + latest repo version
-
fix query agmentation tokens
-
colsmol revision (
9864e2a
)
Unknown
-
Add Classification Evaluator unit test (#2838)
-
Adding Classification Evaluator test
-
Modifications due to the comments
-
Update tests/test_evaluators/test_ClassificationEvaluator.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
- Update tests/test_evaluators/test_ClassificationEvaluator.py
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
-
Modifications due to the comments
-
Modifications due to the comments
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (4a47f90
)
-
model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) (#2889)
-
feat: add KaLM_Embedding_X_0605 in kalm_models
-
Update kalm_models.py for lint format
-
kalm-emb-v2
-
kalm-emb-v2
-
kalm-emb-v2
-
kalm-emb-v2
-
kalm-emb-v2
Co-authored-by: xinshuohu <xinshuohu@tencent.com>
Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> (9ecac21
)
1.38.34
1.38.34 (2025-07-10)
Fix
- fix: pin datasets version (#2892)
fix datasets version (00c95cf
)
Unknown
-
Update tasks & benchmarks tables (
5303fec
) -
dataset: Evalita dataset integration (#2859)
-
Added DadoEvalCoarseClassification
-
Removed unnecessary columns from DadoEvalCoarseClassification
-
Added EmitClassification task
-
added SardiStanceClassification task
-
Added GeoLingItClassification task
-
Added DisCoTexPairClassification tasks
-
Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits
-
changed import in DisCoTexPairClassification
-
removed GeoLingItClassification dataset
-
fixed citation formatting, missing metadata parameters and lint formatting
-
- Added XGlueWRPReranking task
- Added missing init.py files
-
fixed metadata in XGlueWRPReranking
-
Added MKQARetrieval task
-
fixed type in XGlueWRPReranking
-
changed MKQARetrieval from cross-lingual to monolingual
-
formatted MKQARetrieval file
-
removed unused const
Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> (ee17a6e
)
-
model: add Hakim and TookaSBERTV2 models (#2826)
-
add tooka v2s
-
add mcinext models
-
update mcinext.py
-
Apply PR review suggestions
-
Update mteb/models/mcinext_models.py
Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (04dc6d4
)
-
Update tasks & benchmarks tables (
5be02c1
) -
Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)
-
Add JaCWIR and JQaRA for reranking
-
Fix ANLP Journal datasets
-
Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval
-
tackle test cases
-
Remove _evaluate_subset usage
-
Separate v1 and v2
-
Update info for NLP Journal datasets (
70768b5
) -
Comment kalm model (#2877)
comment kalm model (a3ca95c
)
-
model: add kalm_models ModelMeta (new PR) (#2853)
-
feat: add KaLM_Embedding_X_0605 in kalm_models
-
Update kalm_models.py for lint format
Co-authored-by: xinshuohu <xinshuohu@tencent.com> (b67bd04
)
-
model: add listconranker modelmeta (#2874)
-
add listconranker modelmeta
-
fix bugs
-
use linter
-
lint
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (5846f56
)
-
fix tests to be compatible with
SentenceTransformers
v5
(#2875) -
fix sbert
v5
-
add comment (
f346a37
) -
rename seed-1.6-embedding to seed1.6-embedding (#2870) (
f27648b
) -
model: Adding nvidia/llama-nemoretriever-colembed models (#2861)
-
nvidia_llama_nemoretriever_colembed
-
correct 3b reference
-
lint fix
-
add training data and license for nvidia/llama_nemoretriever_colembed
-
lint
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (4ff1413
)
- Bump gradio to fix leaderboard sorting (#2866)
Bump gradio (a4388c2
)
-
model: Adding Sailesh97/Hinvec (#2842)
-
Adding Hinvec Model's Meta data.
-
Adding hinvec_model.py
-
Update mteb/models/hinvec_models.py
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
- formated code with Black and lint with Ruff
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (e3286d5
)