BREAKING: v2.0.0 #1433

KennethEnevoldsen · 2024-11-11T09:24:01Z

This is a work-in-progress branch which will be the release of MTEB v2.0.0!

Features:

Added evaluation of image embedding (MIEB, not merged in yet)
Improved handling of seeds (can still be improved by Avoid using global seeds #942)
Major updates to the leaderboard
Evaluators ambiguity: class/module #1124
New benchmark interface #1272
Remove encode_corpus and encode_queries and implement a "document" class #1284
Consolidate Retrieval/Reranking/Instruction Variants #1359
Combine different version of the Encoder interface [v2] #1735

@x-tabdeveloping, @orionw, @isaac-chung, @Samoed, @gowitheflow-1998 etc. please make PR to this when relevant (MIEB still goes it its own branch but will try to merge it in here)

* update * merged retrieval; working * update tasks; working multilingual * everything working except instructions * working instructions; just need cleanup * add metadata for all but MindSmall * faster evaluation; mindsmall can compute in reasonable time * fix bad merge of docs * lint * fix test * qa * updated mindsmall * lint * fix debug * Update mteb/abstasks/dataloaders.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

…into v2.0.0

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * base * sync with main --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com>

* enable codecarbon by default * lint * update flag * add allow_multiple_runs param * make lint * add warning * lint * negate the flag --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* run tasks * remove test script * lint * remove cache * fix sickbrsts * fix tests * add datasets

* fix test * skip mock * add message to assert * fix test * lint * fix tests * upd tests * update descriptive stats files * add stat to speed

* multilingual loader * lint

* add citations * fix typo

* add code for comupting number of qrels * add stats fever hotpotqa msmarco topiocqa * miracl mrtidy * multilongdoc miracl reranking * add multi eurlex * fix tests for descriptive stats * fix tests --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* add code for comupting number of qrels * BibleNLPBitextMining descriptive stats added * SwissJudgementClassification descriptive stats added * VoyageMMarcoReranking descriptive stats added * WebLINXCandidatesReranking descriptive stats added * MultiEURLEXMultilabelClassification descriptive stats added * MIRACLReranking descriptive stats added * MindSmallReranking descriptive stats added * updated test_TaskMetadata * fix test --------- Co-authored-by: Imene Kerboua <imenelydia.kr@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* fix bright loader * lint * fix comment

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * Fix: Made data parsing in the leaderboard figure more robust (#1450) Bugfixes with data parsing in main figure * Fixed task loading (#1451) * Fixed task result loading from disk * Fixed task result loading from disk * fix: publish (#1452) * 1.19.6 Automatically generated by python-semantic-release * fix: Fix load external results with `None` mteb_version (#1453) * fix * lint * 1.19.7 Automatically generated by python-semantic-release * WIP: Polishing up leaderboard UI (#1461) * fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric * fix: loading pre 1.11.0 (#1460) * small fix * fix: fix * 1.19.8 Automatically generated by python-semantic-release * fix: swap touche2020 to maintain compatibility (#1469) swap touche2020 for parity * 1.19.9 Automatically generated by python-semantic-release * docs: Add sum per language for task counts (#1468) * add sum per lang * add sort by sum option * make lint * fix: pinned datasets to <3.0.0 (#1470) * 1.19.10 Automatically generated by python-semantic-release * feat: add CUREv1 retrieval dataset (#1459) * feat: add CUREv1 dataset --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Daniel Buades Marcos <daniel@buad.es> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> * Update tasks table * 1.20.0 Automatically generated by python-semantic-release * fix: check if `model` attr of model exists (#1499) * check if model attr of model exists * lint * Fix retrieval evaluator * 1.20.1 Automatically generated by python-semantic-release * add cure statistics --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Napuh <55241721+Napuh@users.noreply.github.com> Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com>

* fix bright loader * lint * fix comment * fix stats * fix retrieval stats * update stats * add rest of the stat * move bach code * fix docs * lint

* fix FilipinoHateSpeechClassification * update tests

* init * find all wierd repos * move to mteb WikipediaRetrievalMultilingual * add base upload utils * retrieval, classification, bitextmining * test retrieval * test retrieval * test task uploaded * update tasks * working version * remove comments * lint * move upload * fix tests * fix test * move upload to task * Update mteb/tasks/Retrieval/multilingual/WikipediaRetrievalMultilingual.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: hatespeech filipino (#1522) * fix FilipinoHateSpeechClassification * update tests * lint --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * Fix: Made data parsing in the leaderboard figure more robust (#1450) Bugfixes with data parsing in main figure * Fixed task loading (#1451) * Fixed task result loading from disk * Fixed task result loading from disk * fix: publish (#1452) * 1.19.6 Automatically generated by python-semantic-release * fix: Fix load external results with `None` mteb_version (#1453) * fix * lint * 1.19.7 Automatically generated by python-semantic-release * WIP: Polishing up leaderboard UI (#1461) * fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric * fix: loading pre 1.11.0 (#1460) * small fix * fix: fix * 1.19.8 Automatically generated by python-semantic-release * fix: swap touche2020 to maintain compatibility (#1469) swap touche2020 for parity * 1.19.9 Automatically generated by python-semantic-release * docs: Add sum per language for task counts (#1468) * add sum per lang * add sort by sum option * make lint * fix: pinned datasets to <3.0.0 (#1470) * 1.19.10 Automatically generated by python-semantic-release * feat: add CUREv1 retrieval dataset (#1459) * feat: add CUREv1 dataset --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Daniel Buades Marcos <daniel@buad.es> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> * Update tasks table * 1.20.0 Automatically generated by python-semantic-release * fix: check if `model` attr of model exists (#1499) * check if model attr of model exists * lint * Fix retrieval evaluator * 1.20.1 Automatically generated by python-semantic-release * fix: Leaderboard demo data loading (#1507) * Made get_scores error tolerant * Added join_revisions, made get_scores failsafe * Fetching metadata fixed fr HF models * Added failsafe metadata fetching to leaderboard code * Added revision joining to leaderboard app * fix * Only show models that have metadata, when filter_models is called * Ran linting * 1.20.2 Automatically generated by python-semantic-release * fix: leaderboard only shows models that have ModelMeta (#1508) Filtering for models that have metadata * 1.20.3 Automatically generated by python-semantic-release * fix: align readme with current mteb (#1493) * align readme with current mteb * align with mieb branch * fix test * 1.20.4 Automatically generated by python-semantic-release * docs: Add lang family mapping and map to task table (#1486) * add lang family mapping and map to task table * make lint * add back some unclassified lang codes * Update tasks table * fix: Ensure that models match the names on embedding-benchmarks/results (#1519) * 1.20.5 Automatically generated by python-semantic-release * fix: Adding missing metadata on models and mathcing names up with the results repo (#1528) * Added Voyage 3 models * Added correct metadata to Cohere models and matched names with the results repo * 1.20.6 Automatically generated by python-semantic-release * feat: Evaluate missing splits (#1525) * fix: evaluate missing splits (#1268) * implement partial evaluation for missing splits * lint * requested changes done from scratch * test for missing split evaluation added * uncomment test * lint * avoid circular import * use TaskResult * skip tests for now --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * got test_all_splits_evaluated passing * tests passing * address review comments * make lint * handle None cases for kg_co2_emissions * use new results info --------- Co-authored-by: Thivyanth <thivyanth2004@gmail.com> * 1.21.0 Automatically generated by python-semantic-release * fix: Correct typos superseeded -> superseded (#1532) fix typo -> superseded * 1.21.1 Automatically generated by python-semantic-release * fix: Task load data error for SICK-BR-STS and XStance (#1534) * fix task load data for two tasks * correct dataset keys * 1.21.2 Automatically generated by python-semantic-release * fix: Proprietary models now get correctly shown in leaderboard (#1530) * Fixed showing proprietary models in leaderboard * Added links to all OpenAI models * Fixed table formatting issues * Bumped Gradio version * 1.21.3 Automatically generated by python-semantic-release * docs: Add Model Meta parameters and metadata (#1536) * add multi_qa_MiniLM_L6_cos_v1 model meta * add all_mpnet_base_v2 * add parameters to model meta * make lint * add extra params to meta * fix: add more model meta (jina, e5) (#1537) * add e5 model meta * address review comments * 1.21.4 Automatically generated by python-semantic-release * Add cohere models (#1538) * fix: bug cohere names * format * fix: add nomic models (#1543) #1515 * fix: Added all-minilm-l12-v2 (#1542) #1515 * fix: Added arctic models (#1541) #1515 * fix: add sentence trimming to OpenAIWrapper (#1526) * fix: add sentence trimming to OpenAIWrapper * fix: import tiktoken library inside encode function * fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name * fix: pass tokenizer_name, max_tokens to loader * fix: make tokenizer_name None for default * fix: delete changes for ModelMeta * fix: fix revision to 2 for OpenAI models * fix: add docstring for OpenAIWrapper * fix: lint * feat: add openai optional dependency set * fix: add sleep for too many requests * fix: add lint * fix: delete evaluate file * 1.21.5 Automatically generated by python-semantic-release * fix: Fixed metadata errors (#1547) * 1.21.6 Automatically generated by python-semantic-release * fix: remove curev1 from multlingual (#1552) Seems like it was added here: 1cc6c9e * 1.21.7 Automatically generated by python-semantic-release * fix: Add Model2vec (#1546) * Added Model2Vec wrapper * Added Model2vec models * Added model2vec models to registry * Added model2vec as a dependency * Ran linting * Update mteb/models/model2vec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/models/model2vec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Added adapted_from and superseeded_by to model2vec models. * Added missing import * Moved pyproject.toml to optional dependencies * Fixed typos * Added import error and changed model to model_name * Added Numpy to frameworks * Added Numpy to frameworks * Corrected false info on model2vec models * Replaced np.inf with maxint * Update mteb/models/model2vec_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Added option to have infinite max tokens, added it to Model2vec --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554) * Removed train and dev from eval splits on HotpotQA * Removed dev from eval splits on DBPedia * Made task_results validation more permissive * Readded exception in get_score * Ran linting * 1.21.8 Automatically generated by python-semantic-release * docs: Correction of SICK-R metadata (#1558) * Correction of SICK-R metadata * Correction of SICK-R metadata --------- Co-authored-by: rposwiata <rposwiata@opi.org.pl> * feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562) * fix: google_models batching and prompt * feat: add text-embedding-005 and text-multilingual-embedding-002 * chore: `make lint` errors * fix: address PR comments * 1.22.0 Automatically generated by python-semantic-release * fix(bm25s): search implementation (#1566) fix: bm25s implementation * 1.22.1 Automatically generated by python-semantic-release * docs: Fix dependency library name for bm25s (#1568) * fix: bm25s implementation * correct library name --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> * fix: Add training dataset to model meta (#1561) * fix: Add training dataset to model meta Adresses #1556 * Added docs * format * feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564) * feat: batch requests to cohere models * fix: use correct task_type * feat: use tqdm with openai * fix: explicitely set `show_progress_bar` to False * fix(publichealth-qa): ignore rows with `None` values in `question` or `answer` (#1565) * 1.23.0 Automatically generated by python-semantic-release * fix wongnai * update inits * fix tests * lint * update imports * fix tests * lint --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Napuh <55241721+Napuh@users.noreply.github.com> Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Thivyanth <thivyanth2004@gmail.com> Co-authored-by: Youngjoon Jang <82500463+yjoonjang@users.noreply.github.com> Co-authored-by: Rafał Poświata <rafalposwiata@gmail.com>

# Conflicts: # docs/tasks.md # mteb/abstasks/AbsTaskClassification.py # mteb/abstasks/AbsTaskClusteringFast.py # mteb/abstasks/AbsTaskInstructionRetrieval.py # mteb/abstasks/AbsTaskMultilabelClassification.py # mteb/abstasks/AbsTaskPairClassification.py # mteb/abstasks/AbsTaskReranking.py # mteb/abstasks/AbsTaskRetrieval.py # mteb/abstasks/AbsTaskSTS.py # mteb/descriptive_stats/InstructionRetrieval/Core17InstructionRetrieval.json # mteb/descriptive_stats/MultilabelClassification/MultiEURLEXMultilabelClassification.json # mteb/descriptive_stats/Reranking/AskUbuntuDupQuestions.json # mteb/descriptive_stats/Reranking/ESCIReranking.json # mteb/descriptive_stats/Reranking/WikipediaRerankingMultilingual.json # mteb/descriptive_stats/Retrieval/AppsRetrieval.json # mteb/descriptive_stats/Retrieval/BelebeleRetrieval.json # mteb/descriptive_stats/Retrieval/COIRCodeSearchNetRetrieval.json # mteb/descriptive_stats/Retrieval/CodeEditSearchRetrieval.json # mteb/descriptive_stats/Retrieval/CodeFeedbackMT.json # mteb/descriptive_stats/Retrieval/CodeFeedbackST.json # mteb/descriptive_stats/Retrieval/CodeSearchNetCCRetrieval.json # mteb/descriptive_stats/Retrieval/CodeSearchNetRetrieval.json # mteb/descriptive_stats/Retrieval/CodeTransOceanContest.json # mteb/descriptive_stats/Retrieval/CodeTransOceanDL.json # mteb/descriptive_stats/Retrieval/CosQA.json # mteb/descriptive_stats/Retrieval/JaqketRetrieval.json # mteb/descriptive_stats/Retrieval/NFCorpus.json # mteb/descriptive_stats/Retrieval/StackOverflowQA.json # mteb/descriptive_stats/Retrieval/SyntheticText2SQL.json # mteb/descriptive_stats/Retrieval/Touche2020.json # mteb/descriptive_stats/Retrieval/Touche2020Retrieval.v3.json # mteb/descriptive_stats/Retrieval/mFollowIRCrossLingualInstructionRetrieval.json # mteb/descriptive_stats/Retrieval/mFollowIRInstructionRetrieval.json # mteb/evaluation/MTEB.py # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/leaderboard/app.py # mteb/leaderboard/figures.py # mteb/leaderboard/table.py # mteb/model_meta.py # mteb/models/arctic_models.py # mteb/models/e5_models.py # mteb/models/nomic_models.py # mteb/models/overview.py # mteb/models/sentence_transformers_models.py # mteb/tasks/Reranking/zho/CMTEBReranking.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/STS/por/SickBrSTS.py # pyproject.toml # tests/test_benchmark/mock_tasks.py

* sort logos, add mkdocs outline, add index page * Added tons of documentation * Added some more docs to abstask * reduced docs to only include API docs for now * fixed import hell * Fixed more nasty import to get docs to work * API docs work! * fixed link * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * format --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: reorder argument for mteb.get_tasks This should make the function more intuitive to use * typo --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: Make deduplication in PairClassificationEvaluator stable * remove prompt type * remove prompt type missed one --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

* feat: add new arctic v2.0 models (#1574) * feat: add new arctic v2.0 models * chore: make lint * 1.24.0 Automatically generated by python-semantic-release * fix: Add namaa MrTydi reranking dataset (#1573) * Add dataset class and file requirements * pass tests * make lint changes * adjust meta data and remove load_data --------- Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> * Update tasks table * 1.24.1 Automatically generated by python-semantic-release * fix: Eval langs not correctly passed to monolingual tasks (#1587) * fix SouthAfricanLangClassification.py * add check for langs * lint * 1.24.2 Automatically generated by python-semantic-release * feat: Add ColBert (#1563) * feat: add max_sim operator for IR tasks to support multi-vector models * docs: add doc for Model2VecWrapper.__init__(...) * feat: add ColBERTWrapper to models & add ColBERTv2 * fix: resolve issues * fix: resolve issues * Update README.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * README.md: rm subset * doc: update example for Late Interaction * get colbert running without errors * fix: pass is_query to pylate * fix: max_sim add pad_sequence * feat: integrate Jinja templates for ColBERTv2 and add model prompt handling * feat: add revision & prompt_name * doc: pad_sequence * rm TODO jina colbert v2 * doc: warning: higher resource usage for MaxSim --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.25.0 Automatically generated by python-semantic-release * doc: colbert add score_function & doc section (#1592) * doc: colbert add score_function & doc section * doc: Update README.md Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * doc: Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Feat: add support for scoring function (#1594) * add support for scoring function * lint * move similarity to wrapper * remove score function * lint * remove from InstructionRetrievalEvaluator * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * remove score function from README.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Add new models nvidia, gte, linq (#1436) * Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia models re: instruction used in docs as well --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Leaderboard: Refined plots (#1601) * Added embedding size guide to performance-size plot, removed shading on radar chart * Changed plot names to something more descriptive * Made plots failsafe * fix: Leaderboard refinements (#1603) * Added explanation of aggregate measures * Added download button to result tables * Task info gets sorted by task name * Added custom, shareable links for each benchmark * Moved explanation of aggregate metrics to the summary tab * 1.25.1 Automatically generated by python-semantic-release * Feat: Use similarity scores if available (#1602) * Use similarity scores if available * lint * Add NanoBEIR Datasets (#1588) * add NanoClimateFeverRetrieval task, still requires some debugging * move task to correct place in init file * add all Nano datasets and results * format code * Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * pin revision to commit and add datasets to benchmark.py * create new benchmark for NanoBEIR * add revision when loading datasets * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Update tasks table * Feat: Evaluate missing languages (#1584) * init * fix tests * update mock retrieval * update tests * use subsets instead of langs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix tests * add to readme * rename subset in readme --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Add IBM Granite Embedding Models (#1613) * add IBM granite embedding models * lint formatting * add adapted_from and superseded_by to ModelMeta * fix: disable co2_tracker for API models (#1614) * 1.25.2 Automatically generated by python-semantic-release * fix: set `use_instructions` to True in models using prompts (#1616) feat: set `use_instructions` to True in models using prompts * 1.25.3 Automatically generated by python-semantic-release * update RetrievalEvaluator.py * update imports * update imports and metadata * fix tests * fix tests * fix output path for retrieval * fix similarity function --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Omar Elshehy <41394057+omarelshehy@users.noreply.github.com> Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: KGupta10 <92774828+KGupta10@users.noreply.github.com> Co-authored-by: Aashka Trivedi <aashka.trivedi@gmail.com>

@Samoed

* Merge main into v2 * fix model imports * added missing task imports * refactor task import This refactors imports following this pattern: ```py # tasks/__init__ from .Retrieval import * # tasks/retrieval/__init__ from .eng import * # tasks/retrieval/eng/__init__ from .task1 import Task1 ``` proposed by @Samoed in #2825. This should reduce the number of imports required, while not exposing any of the module requires at the task definition. * added missing desriptive stats * format

@Samoed

* Merge main into v2 * fix model imports * added missing task imports * refactor task import This refactors imports following this pattern: ```py # tasks/__init__ from .Retrieval import * # tasks/retrieval/__init__ from .eng import * # tasks/retrieval/eng/__init__ from .task1 import Task1 ``` proposed by @Samoed in #2825. This should reduce the number of imports required, while not exposing any of the module requires at the task definition. * added missing desriptive stats * fix: : rename TaskMetadata.py to resolve class/module ambiguity related to: #1124 required for: #2714 Seems like we in multiple places denote the module instead of the intended TaskMetada. This rename should fix that issue relies on PR #2828 * format

* Merge main into v2 * fix model imports * added missing task imports * added missing desriptive stats * fix: Added docs for `mteb.evaluate` - renamed `mteb.run_tasks` to `mteb.evaluate`. Reverting this is fairly easy but I think the rename makes a lot of sense - Added docs to most places - some aren't changed yet as they haven't been tested (#2830) - I didn't change the datasheet to avoid confusion with uploaded datasets partly fixes: #2793 * format * fix import * Update docs/mieb/readme.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update docs/usage/usage.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* refactor copali to use new interface wip * use v2 interface * receive only dataloader

* add ListConRanker model * updated the implementation of ListConRanker * updated the release date of ListConRanker * added the training datasets and changed the release date of ListConRanker * updated the training datasets of ListConRanker * lint * fix import --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

Add IFIR relevant tasks. Signed-off-by: SighingSnow <songtingyu220@gmail.com>

- Move all implementations into a seperate folder called `model_implementations` - moved `encoder_interface.py` and `model_meta.py` into `models` - renamed `models/*` to `encoder_implementions/*` to make the distinction from between the two folder clear - merged `models/utils.py` into the only model that used it We seems to have a few differing names when referring to a model (ModelMeta, get_model, etc.) and encoders (Encoder, AbsEncoder). Should we try to do something about this or just leave it as is? There is also an inconsistency between how tasks and implementations are in seperate folders, but for benchmark this is not the case. We could convert it to: ``` benchmarks/tasks/models | - implementations/* | - ... # definitions utilities etc. ``` But I am not sure it is worth it and for tasks it might be too much nesting. So I would probably leave it as is. Note: There is a few refactors that I would like to do on top of this, but will add that to seperate PR (since it is too hard to review here) Fixes #2299

* introduce AbsTaskAnyClustering * trigger CI * remove image clustering abstask * revert * address review comments * fix tests * fix descriptive stats tests * fix for mteb eng v1 datasets

* introduce AbsTaskAnyZeroShotClassification * fix tests * address review comments * add mock text ZS task and handle text case * fix tests * pass all encode kwargs

* fix: refactor models modules - refactored loading of models - now all ModelMeta are imported - fixed a few metadata issues due to missing imports - renamed private methods to `_{prev_name}` to indicate that they are private - renamed `models/overview.py` > `models/get_model_meta.py`` - fixed a few typing issues in the models module * fix typing * fixed spelling * minor fixes to imports for clarity * rollback readme * allow revision to be None * fix extract models names

* bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * dataset: Evalita dataset integration (#2859) * Added DadoEvalCoarseClassification * Removed unnecessary columns from DadoEvalCoarseClassification * Added EmitClassification task * added SardiStanceClassification task * Added GeoLingItClassification task * Added DisCoTexPairClassification tasks * Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits * changed import in DisCoTexPairClassification * removed GeoLingItClassification dataset * fixed citation formatting, missing metadata parameters and lint formatting * - Added XGlueWRPReranking task - Added missing __init__.py files * fixed metadata in XGlueWRPReranking * Added MKQARetrieval task * fixed type in XGlueWRPReranking * changed MKQARetrieval from cross-lingual to monolingual * formatted MKQARetrieval file * removed unused const --------- Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> * Update tasks & benchmarks tables * fix: pin datasets version (#2892) fix datasets version * 1.38.34 Automatically generated by python-semantic-release * fix model implementations * fix tasks * add metrics --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Quan Yuhan <929888357@qq.com> Co-authored-by: Quan Yuhan <yuhan_quan@qq.com> Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com> Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com> Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com> Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: lsz05 <lszgz0521@gmail.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: MattiaSangermano <43407984+MattiaSangermano@users.noreply.github.com> Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>

# Conflicts: # docs/create_tasks_table.py # docs/usage/usage.md # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/models/instruct_wrapper.py # mteb/models/model_implementations/fa_models.py # mteb/models/model_implementations/jina_models.py # mteb/models/model_implementations/ru_sentence_models.py # mteb/models/overview.py # mteb/models/wrapper.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Classification/ita/DadoEvalCoarseClassification.py # mteb/tasks/Classification/ita/SardiStanceClassification.py # mteb/tasks/Clustering/nob/snl_clustering.py # mteb/tasks/MultiLabelClassification/__init__.py # mteb/tasks/MultiLabelClassification/ita/EmitClassification.py # mteb/tasks/PairClassification/__init__.py # mteb/tasks/PairClassification/ita/DisCoTexPairClassification.py # mteb/tasks/Reranking/__init__.py # mteb/tasks/Reranking/jpn/JQaRAReranking.py # mteb/tasks/Reranking/jpn/JaCWIRReranking.py # mteb/tasks/Reranking/multilingual/XGlueWPRReranking.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/Retrieval/eng/R2MEDRetrieval.py # mteb/tasks/Retrieval/jpn/JaCWIRRetrieval.py # mteb/tasks/Retrieval/jpn/NLPJournalAbsArticleRetrieval.py # mteb/tasks/Retrieval/jpn/NLPJournalAbsIntroRetrieval.py # mteb/tasks/Retrieval/jpn/NLPJournalTitleAbsRetrieval.py # mteb/tasks/Retrieval/jpn/NLPJournalTitleIntroRetrieval.py # mteb/tasks/Retrieval/multilingual/MKQARetrieval.py # pyproject.toml # tests/test_benchmark/mock_models.py # tests/test_benchmark/test_benchmark.py

* start adding * standardize statistics * remove irrelevant file * update retrieval calculation * update zeroshot statistics * fix random

* add debug print * add comment

* fix retrieval dataset upload * add readme repo type * fix adapted * add reupload flag * fix tasks uploading * add reupload datasets flag * reupload reuploaded MIRACLRetrieval.py * fix trust remote code * prepare miracl for reuploading * use mteb miracl * support qrels split * roll back miracle * remove reuload flag

* fix: Update ResultsCache - [x] Added tests - [x] Added utility interfaces for examining the cache - [x] Added load_results - [x] Updated docs to use ResultsCache instead We could also update the leaderboard to use ResultCache, but I don't want to do that in this PR. When that is done I would probably depracate `mteb.load_results` or convert it to a shorthand function for ```py ResultCache().load_result(**kwargs) ``` Deprecating leads to less breaking changes. Minor: - removed `results/` from .gitignore * fixed based on copilot feedback * fix issues in tests * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix tests * fix issues arising from multiple version across remote and results folder --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* update training datasets * update training datasets * fix test * update test

* [v2] `mteb.evaluate` now saves `model_meta.json` Fixes #2847 * format * remove unused arg

* split batched input by subtypes * add text batched input * add type annotation for dataloaders * return array with Union * fix union typing

…leted unused files (#2961) * make stratification a private module * removed duplicate model implementations which were in the wrong folder * removed commented out model code * add unused task imports * update ScandiSentClassification to v2 I also ran a test and it runs without issue * remove task aggregation script as it is unused * add import for ClusTrecCovid and updated to v2 Also ran test - it runs just fine * add missing task imports * added missing task imports * rename model_classes to dense_retrieval_exact_search * rename utils.py to _download.py as it only contains the download function * format * fix evaluator import * ibid * remove test for unused code * fix: Compute missing data and create issue where not possible * computed missing task metadata * Ignore vscode debug file * Update pylate to be compatible with the latest version of sentence-transformers * ibid

* fix: rename evaluators rename to snakecase same as #2962 (but avoiding merge conflicts) fixes #1124 * format

* change corpus and queries to dataset * remove commented out code * add convertion for v1 datasets * fix descriptive stats * update reranking * format * fix tests * lint * change ids of mock dataset * change score for colbert * add type for corpus and queries datasets * fix reranking task * format * update push to hub * update statistics calculation * simplify `create_dataloader_for_retrieval_corpus` * remove check with queries id * add instruction dataset type * fully annotate retrieval types * remove irrelevant type annotation

* fix reranking stat calculation * remove from tests

* Refactor CLI to enable changes - [x] move cli, create_meta into module - [x] rename create_meta > generate_readme - [x] rename cli > build_cli - refactored build_cli() out of main() - move the main function into __main__ - [x] moved docstring into documentation - [x] made function with add parsers private * fix based on comments * format * rename cli.main to cli.build_cli

* change corpus and queries to dataset * remove commented out code * add convertion for v1 datasets * fix descriptive stats * update reranking * format * fix tests * lint * change ids of mock dataset * change score for colbert * add type for corpus and queries datasets * fix reranking task * format * update push to hub * update statistics calculation * simplify `create_dataloader_for_retrieval_corpus` * remove check with queries id * add instruction dataset type * fully annotate retrieval types * remove irrelevant type annotation * init * base search interface implementation * base search interface implementation * add todo comment * add link to todo * Update mteb/models/search/search_crossencoder.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update mteb/create_dataloaders.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove search folder * fix imports * fix tests * add support for cross encoder models * combine back encoder * add additional check for interface * resolve copilot comment * fix union type * roll back rename in validate_task_to_prompt_name * fix descriptive stats * [v2] Combine instructions with queries (#2984) * combine instructions with queries * fix old format ds * rename `MtebSupportedModelProtocols` and add `RetrievalEvaluationResult` tuple --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* standardize `mteb_model_meta` property * format

* init two stage * working 2 stage reranking * upd numpy meta * fix tests * fix python 3.9 * format * simplify * fix cross encoder meta * add meta to sentence transformers wrapper * fix model meta * create `RetrievalSaveResultsWrapper` * rename * save only model name and revision in `previous_results_model_meta` * add results save path * Update mteb/evaluate.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * rename results folder to prediction * add more info about save path --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

fix: Ensure seed is based on RNG State (#1193)

e2520df

KennethEnevoldsen added this to the v2.0.0 milestone Nov 11, 2024

isaac-chung marked this pull request as draft November 11, 2024 09:27

KennethEnevoldsen mentioned this pull request Nov 13, 2024

Consolidate Retrieval/Reranking/Instruction Variants #1359

Merged

1 task

orionw and others added 5 commits November 13, 2024 11:30

fix: Unsure TaskResults can handle runtime and version being unspecified

2a8a370

Merge branch 'v2.0.0' of https://github.com/embeddings-benchmark/mteb …

dea2b77

…into v2.0.0

fix: remove NaN handling for retrieval

23d6cb2

Merge branch 'main' into v2.0.0

8868cd4

Samoed mentioned this pull request Nov 14, 2024

fix: Count unique texts, data leaks in calculate metrics #1438

Merged

2 tasks

Samoed and others added 20 commits November 14, 2024 21:26

feat: enable codecarbon by default (#1428)

70a3ff2

* enable codecarbon by default * lint * update flag * add allow_multiple_runs param * make lint * add warning * lint * negate the flag --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Add decriptive stat almost to all datasets (#1466)

0e9b6fd

* run tasks * remove test script * lint * remove cache * fix sickbrsts * fix tests * add datasets

fix: Fix test for empty descriptive tasks (#1413)

0a5bedb

* fix test * skip mock * add message to assert * fix test * lint * fix tests * upd tests * update descriptive stats files * add stat to speed

fix: pin datasets version <3.0.0 (#1471)

6da2a1a

feat: Multilingual retrieval loader (#1473)

a27de33

* multilingual loader * lint

fix: add citations to ModelMeta (#1477)

0df0210

* add citations * fix typo

fix: Fix BrightRetrieval calculate stats (#1484)

99247b2

* fix bright loader * lint * fix comment

Fix: retrieval stats (#1496)

6383950

* fix bright loader * lint * fix comment * fix stats * fix retrieval stats * update stats * add rest of the stat * move bach code * fix docs * lint

fix: hatespeech filipino (#1522)

d54fb75

* fix FilipinoHateSpeechClassification * update tests

fix: reorder argument for mteb.get_tasks (#1597)

6a8e188

* fix: reorder argument for mteb.get_tasks This should make the function more intuitive to use * typo --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

fix: Make deduplication in PairClassificationEvaluator stable (#1315)

d6130ad

* fix: Make deduplication in PairClassificationEvaluator stable * remove prompt type * remove prompt type missed one --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

KennethEnevoldsen and others added 17 commits June 17, 2025 08:14

[v2] Refactor CoPali new interface (#2844)

277ab49

* refactor copali to use new interface wip * use v2 interface * receive only dataloader

fix colpali imports

9b23593

dataset: Add IFIR benchmark (#2815)

ed69e60

Add IFIR relevant tasks. Signed-off-by: SighingSnow <songtingyu220@gmail.com>

[v2] Introduce AbsTaskAnyClustering (#2880)

4479e12

* introduce AbsTaskAnyClustering * trigger CI * remove image clustering abstask * revert * address review comments * fix tests * fix descriptive stats tests * fix for mteb eng v1 datasets

introduce AbsTaskAnyZeroShotClassification (#2884)

be66e92

* introduce AbsTaskAnyZeroShotClassification * fix tests * address review comments * add mock text ZS task and handle text case * fix tests * pass all encode kwargs

[v2] Refactor descriptive stats (#2823)

29a9228

* start adding * standardize statistics * remove irrelevant file * update retrieval calculation * update zeroshot statistics * fix random

[v2] Fix classification task random state generation (#2898)

1e7a742

* add debug print * add comment

Samoed force-pushed the v2.0.0 branch from 0f1478c to a2ba40c Compare July 25, 2025 13:28

Samoed and others added 12 commits July 30, 2025 21:05

[v2] Change training datasets from dict to set (#2959)

266e4bd

* update training datasets * update training datasets * fix test * update test

[v2] mteb.evaluate now saves model_meta.json (#2960)

7c60af5

* [v2] `mteb.evaluate` now saves `model_meta.json` Fixes #2847 * format * remove unused arg

[v2] split batched input by subtypes (#2966)

3840107

* split batched input by subtypes * add text batched input * add type annotation for dataloaders * return array with Union * fix union typing

fix: rename evaluators rename to snakecase (#2979)

5100117

* fix: rename evaluators rename to snakecase same as #2962 (but avoiding merge conflicts) fixes #1124 * format

fix instruction fields (#2986)

e3f4076

[v2] Fix stats compute for tasks with custom load_data (#2985)

64478e7

* fix reranking stat calculation * remove from tests

[v2] Standardize mteb_model_meta property (#3049)

cf01bc8

* standardize `mteb_model_meta` property * format

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BREAKING: v2.0.0 #1433

BREAKING: v2.0.0 #1433

Uh oh!

KennethEnevoldsen commented Nov 11, 2024 •

edited by Samoed

Loading

Uh oh!

Uh oh!

BREAKING: v2.0.0 #1433

Are you sure you want to change the base?

BREAKING: v2.0.0 #1433

Uh oh!

Conversation

KennethEnevoldsen commented Nov 11, 2024 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Nov 11, 2024 •

edited by Samoed

Loading