Add Training data annotations #2173

KennethEnevoldsen · 2025-02-26T15:03:46Z

fixes #2168
fixes #2164
fixes #2166
fixes #2162

Removed fiqa pl and arxivclusteringS2s.v2 which both does not exist.

…nto training-data-anno

Samoed

Great!

mteb/models/misc_models.py

Samoed · 2025-02-26T16:36:23Z

Something wrong with windows runner. Second time it is running more than hour

* test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * lint * fix code carbon * fix aggregated * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * lint * refactor * update BEIR-PL annotation * fix * update test --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

* redo to voyage to only training data * Add training data annotation for Kalm embeddings embeddings-benchmark#2168 * Add correct training data annotations to Stella embeddings-benchmark#2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding embeddings-benchmark#2166 * fix max tokens for kalm models embeddings-benchmark#2162 * remove eli 5

* misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see embeddings-benchmark/results#117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * make lint * fix validation for license * fix remaining validation errors --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com>

* misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see embeddings-benchmark/results#117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Add `trust_remote_code` to MIRACLRetrieval (#2344) * 1.36.21 Automatically generated by python-semantic-release * fix: Correctly pass trust remote code to Miracl * fix: Ensure MIRACL pass trust_remote_code (#2346) * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Correctly pass trust remote code to Miracl * fix * 1.36.22 Automatically generated by python-semantic-release * add-Data Korean Clustering dataset (KLUE-modified) (#2283) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Rename dunzhang and Jasper models to NovaResearch (#2373) * Rename dunzhang and Jasper models to NovaResearch * rename model in tests * correct reference link * correct MIEB dataset stats (#2374) * correct stats * update Any2AnyMultiChoice qrels stats compute logic * final correction * Update tasks table * Correct -1 to No information in Zero shot (#2381) * fix leaderboard (#2385) * fix: Reduce logging and Warnings (#2349) * Reduce logging and Warnings * make lint * format license to lowercase * Address all comments * Update mteb/leaderboard/app.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.23 Automatically generated by python-semantic-release * fix: b1ade (#2386) * fix: added b1ade_models.py (#2340) * added b1ade_models.py * changing based on requested * Update mteb/models/b1ade_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: missing import and formatting --------- Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> * 1.36.24 Automatically generated by python-semantic-release * fix: pin gradio dependency to ensure leaderboards works (#2387) * 1.36.25 Automatically generated by python-semantic-release * fix: Ensure BrightRetrieval is valid to run (#2334) * fix: Ensure BrightRetrieval is valid to run Not sure this is the best way to fix this. Let me know if you can find a better fix. fixes #2327 * fix: convert brightretrieval to two tasks * fix collecting error * Update tasks table * 1.36.26 Automatically generated by python-semantic-release * Pass task name to all evaluators (#2389) * pass task name to all tasks * add test * fix loader * fix: renaming Zeroshot -> ZeroShot (#2395) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * rename 1 * rename 2 * format * fixed error * 1.36.27 Automatically generated by python-semantic-release * fix: Update AmazonPolarityClassification license (#2402) Update AmazonPolarityClassification.py * fix b1ade name (#2403) * 1.36.28 Automatically generated by python-semantic-release * Minor style changes (#2396) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * rename 1 * rename 2 * format * fixed error --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers (#2302) * Clustrec covid new dataset and task * fix * fix * fix * fix * fix * descriptive stats * change all mentions of clustrec-covidp2p to clustrec-covid * change ' to " * Update tasks table * fix: Major updates to docs + make mieb dep optional (#2397) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * fix: Major updates to documentation This PR does the following: - This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later. - added minor code updates due to discovered inconsistencies in docs and code. - Added the MMTEB citation where applicable - makes the docs ready to move torchvision to an optional dependency * Moved VISTA example * rename 1 * rename 2 * format * fixed error * fix: make torchvision optional (#2399) * fix: make torchvision optional * format * add docs * minor fix * remove transform from Any2TextMultipleChoiceEvaluator --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * move Running SentenceTransformer model with prompts to usage --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.29 Automatically generated by python-semantic-release * remove Arabic_Triplet_Matryoshka_V2.py (#2405) * Min torchvision>0.2.1 (#2410) matching torch>1.0.0 * fix: Add validation to model_name in `ModelMeta` (#2404) * add test for name validation * upd docs * upd cohere name * fix tests * fix name for average_word_embeddings_komninos * fix name for average_word_embeddings_komninos * fix reranker test * fix reranker test * 1.36.30 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414) * refactor CV-Bench * reimplement CV Bench * remove abstask/evaluator/tests for Any2TextMultipleChoice * rerun descriptive stats * Update tasks table * fix: Add option to remove benchmark from leaderboard (#2417) fix: Add option to remove leaderboard from leaderboard fixes #2413 This only removed the benchmark from the leaderboard but keep it in MTEB. * 1.36.31 Automatically generated by python-semantic-release * fix: Add VDR Multilingual Dataset (#2408) * Added VDR Multilingual Dataset * address comments * make lint * Formated Dataset for retrieval * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * make lint * corrected date * fix dataset building * move to image folder --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * 1.36.32 Automatically generated by python-semantic-release * HOTFIX: pin setuptools (#2423) * pin setuptools * pin setuptools * pin setuptools in makefile * try ci * fix ci * remove speed from installs * add __init__.py Clustering > kor folder, And edit __init__.py in Clustering folder (#2422) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset * clustering & kor folder add __init.py * clustering & kor folder add __init__.py * task.py roll-back * correct text_creation to sample_creation & delete form in MetaData * correct task_subtype in TaskMetaData * delete space * edit metadata * edit task_subtypes --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks table * Update speed dependencies with new setuptools release (#2429) * add richinfoai models (#2427) * add richinfoai models add richinfoai models * format codes by linter format codes by linter * Added Memory Usage column on leaderboard (#2428) * docs: typos; Standardize spacing; Chronological order (#2436) * Fix typos; add chrono order * Fix spacing * fix: Add model specific dependencies in pyproject.toml (#2424) * Add model specific dependencies in pyproject.toml * Update documentation * 1.36.33 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442) * MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats * modify benchmark list * fix citation * Update tasks table * Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445) Fixes #2444 * Feat/searchmap preview (#2420) * Added meta information about SearchMap_Preview model to the model_dir * Added meta information about SearchMap_Preview model to the model_dir * updated revision name * Device loading and cuda cache cleaning step left out * removed task instructions since it's not necessary * changed sentence transformer loader to mteb default loader and passed instructions s model prompts * Included searchmap to the models overview page * Included searchmap to the models overview page * added meta data information about where model was adpated from * Update mteb/models/searchmap_models.py * fix lint * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Add Background Gradients in Summary and Task Table (#2392) * Add Background Gradients in Summary and Task Table * Remove warnings and add light green cmap * Address comments * Separate styling function * address comments * added comments * add ops_moa_models (#2439) * add ops_moa_models * add custom implementations * Simplify custom implementation and format the code * support SentenceTransformers * add training datasets * Update mteb/models/ops_moa_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update training_datasets --------- Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * leaderboard fix (#2456) * ci: cache `~/.cache/huggingface` (#2464) ci: cache ~/.cache/huggingface Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468) * reimplement ImageCoDe with ImageTextPairClassification * add missing stats file * Update tasks table * fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443) * feat: added pubmedbert model2vec models * fix: attribute model_name * fix: fixed commit hash for pubmed_bert model2vec models * fix: changes requested in PR 2443 * fix: add nb_sbert model (#2339) * add_nb_sbert_model * Update nb_sbert.py added n_parameters and release_date * Update mteb/models/nb_sbert.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update nb_sbert.py fix: make lint * added nb_sbert to overview.py + ran make lint * Update nb_sbert.py Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12 --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.34 Automatically generated by python-semantic-release * fix test --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com> Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com> Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com> Co-authored-by: richinfo-ai <richinfoai@163.com> Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com> Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com> Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com> Co-authored-by: theatollersrud <thea.tollersrud@nb.no>

* misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see embeddings-benchmark/results#117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Add `trust_remote_code` to MIRACLRetrieval (#2344) * 1.36.21 Automatically generated by python-semantic-release * fix: Correctly pass trust remote code to Miracl * fix: Ensure MIRACL pass trust_remote_code (#2346) * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Correctly pass trust remote code to Miracl * fix * 1.36.22 Automatically generated by python-semantic-release * add-Data Korean Clustering dataset (KLUE-modified) (#2283) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Rename dunzhang and Jasper models to NovaResearch (#2373) * Rename dunzhang and Jasper models to NovaResearch * rename model in tests * correct reference link * correct MIEB dataset stats (#2374) * correct stats * update Any2AnyMultiChoice qrels stats compute logic * final correction * Update tasks table * Correct -1 to No information in Zero shot (#2381) * fix leaderboard (#2385) * fix: Reduce logging and Warnings (#2349) * Reduce logging and Warnings * make lint * format license to lowercase * Address all comments * Update mteb/leaderboard/app.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.23 Automatically generated by python-semantic-release * fix: b1ade (#2386) * fix: added b1ade_models.py (#2340) * added b1ade_models.py * changing based on requested * Update mteb/models/b1ade_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: missing import and formatting --------- Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> * 1.36.24 Automatically generated by python-semantic-release * fix: pin gradio dependency to ensure leaderboards works (#2387) * 1.36.25 Automatically generated by python-semantic-release * fix: Ensure BrightRetrieval is valid to run (#2334) * fix: Ensure BrightRetrieval is valid to run Not sure this is the best way to fix this. Let me know if you can find a better fix. fixes #2327 * fix: convert brightretrieval to two tasks * fix collecting error * Update tasks table * 1.36.26 Automatically generated by python-semantic-release * Pass task name to all evaluators (#2389) * pass task name to all tasks * add test * fix loader * fix: renaming Zeroshot -> ZeroShot (#2395) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * rename 1 * rename 2 * format * fixed error * 1.36.27 Automatically generated by python-semantic-release * fix: Update AmazonPolarityClassification license (#2402) Update AmazonPolarityClassification.py * fix b1ade name (#2403) * 1.36.28 Automatically generated by python-semantic-release * Minor style changes (#2396) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * rename 1 * rename 2 * format * fixed error --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers (#2302) * Clustrec covid new dataset and task * fix * fix * fix * fix * fix * descriptive stats * change all mentions of clustrec-covidp2p to clustrec-covid * change ' to " * Update tasks table * fix: Major updates to docs + make mieb dep optional (#2397) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * fix: Major updates to documentation This PR does the following: - This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later. - added minor code updates due to discovered inconsistencies in docs and code. - Added the MMTEB citation where applicable - makes the docs ready to move torchvision to an optional dependency * Moved VISTA example * rename 1 * rename 2 * format * fixed error * fix: make torchvision optional (#2399) * fix: make torchvision optional * format * add docs * minor fix * remove transform from Any2TextMultipleChoiceEvaluator --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * move Running SentenceTransformer model with prompts to usage --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.29 Automatically generated by python-semantic-release * remove Arabic_Triplet_Matryoshka_V2.py (#2405) * Min torchvision>0.2.1 (#2410) matching torch>1.0.0 * fix: Add validation to model_name in `ModelMeta` (#2404) * add test for name validation * upd docs * upd cohere name * fix tests * fix name for average_word_embeddings_komninos * fix name for average_word_embeddings_komninos * fix reranker test * fix reranker test * 1.36.30 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414) * refactor CV-Bench * reimplement CV Bench * remove abstask/evaluator/tests for Any2TextMultipleChoice * rerun descriptive stats * Update tasks table * fix: Add option to remove benchmark from leaderboard (#2417) fix: Add option to remove leaderboard from leaderboard fixes #2413 This only removed the benchmark from the leaderboard but keep it in MTEB. * 1.36.31 Automatically generated by python-semantic-release * fix: Add VDR Multilingual Dataset (#2408) * Added VDR Multilingual Dataset * address comments * make lint * Formated Dataset for retrieval * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * make lint * corrected date * fix dataset building * move to image folder --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * 1.36.32 Automatically generated by python-semantic-release * HOTFIX: pin setuptools (#2423) * pin setuptools * pin setuptools * pin setuptools in makefile * try ci * fix ci * remove speed from installs * add __init__.py Clustering > kor folder, And edit __init__.py in Clustering folder (#2422) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset * clustering & kor folder add __init.py * clustering & kor folder add __init__.py * task.py roll-back * correct text_creation to sample_creation & delete form in MetaData * correct task_subtype in TaskMetaData * delete space * edit metadata * edit task_subtypes --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks table * Update speed dependencies with new setuptools release (#2429) * add richinfoai models (#2427) * add richinfoai models add richinfoai models * format codes by linter format codes by linter * Added Memory Usage column on leaderboard (#2428) * docs: typos; Standardize spacing; Chronological order (#2436) * Fix typos; add chrono order * Fix spacing * fix: Add model specific dependencies in pyproject.toml (#2424) * Add model specific dependencies in pyproject.toml * Update documentation * 1.36.33 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442) * MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats * modify benchmark list * fix citation * Update tasks table * Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445) Fixes #2444 * Feat/searchmap preview (#2420) * Added meta information about SearchMap_Preview model to the model_dir * Added meta information about SearchMap_Preview model to the model_dir * updated revision name * Device loading and cuda cache cleaning step left out * removed task instructions since it's not necessary * changed sentence transformer loader to mteb default loader and passed instructions s model prompts * Included searchmap to the models overview page * Included searchmap to the models overview page * added meta data information about where model was adpated from * Update mteb/models/searchmap_models.py * fix lint * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Add Background Gradients in Summary and Task Table (#2392) * Add Background Gradients in Summary and Task Table * Remove warnings and add light green cmap * Address comments * Separate styling function * address comments * added comments * add ops_moa_models (#2439) * add ops_moa_models * add custom implementations * Simplify custom implementation and format the code * support SentenceTransformers * add training datasets * Update mteb/models/ops_moa_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update training_datasets --------- Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * leaderboard fix (#2456) * ci: cache `~/.cache/huggingface` (#2464) ci: cache ~/.cache/huggingface Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468) * reimplement ImageCoDe with ImageTextPairClassification * add missing stats file * Update tasks table * fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443) * feat: added pubmedbert model2vec models * fix: attribute model_name * fix: fixed commit hash for pubmed_bert model2vec models * fix: changes requested in PR 2443 * fix: add nb_sbert model (#2339) * add_nb_sbert_model * Update nb_sbert.py added n_parameters and release_date * Update mteb/models/nb_sbert.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update nb_sbert.py fix: make lint * added nb_sbert to overview.py + ran make lint * Update nb_sbert.py Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12 --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.34 Automatically generated by python-semantic-release * suppress logging warnings on leaderboard (#2406) * supress logging warnings * remove loggers * return blocks * rename function * fix gme models * add server name * update after merge * fix ruff * fix: E5 instruct now listed as sbert compatible (#2475) Fixes #1442 * 1.36.35 Automatically generated by python-semantic-release * [MIEB] rename VisionCentric to VisionCentricQA (#2479) rename VisionCentric to VisionCentricQA * ci: Run dataset loading only when pushing to main (#2480) Update dataset_loading.yml * fix table in tasks.md (#2483) * Update tasks table * fix: add prompt to NanoDBPedia (#2486) * 1.36.36 Automatically generated by python-semantic-release * Fix Task Lang Table (#2487) * Fix Task Lang Table * added tasks.md * fix * fix: Ignore datasets not available in tests (#2484) * add back MockAudioEncoder --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com> Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com> Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com> Co-authored-by: richinfo-ai <richinfoai@163.com> Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com> Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com> Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com> Co-authored-by: theatollersrud <thea.tollersrud@nb.no> Co-authored-by: hongst <76415500+seongtaehong@users.noreply.github.com>

KennethEnevoldsen added 7 commits February 26, 2025 15:41

redo to voyage to only training data

6756fec

Add training data annotation for Kalm embeddings #2168

884fab2

Add correct training data annotations to Stella #2164

17c09ba

Merge branch 'main' of https://github.com/embeddings-benchmark/mteb i…

35ebd53

…nto training-data-anno

removed fiqa PL as it does not exist

377864b

remove ArxivClusteringS2S.v2 as it does not exist

100089b

Add training data annotation for GIST embedding #2166

fc2c208

KennethEnevoldsen requested review from x-tabdeveloping and Samoed February 26, 2025 15:03

Samoed approved these changes Feb 26, 2025

View reviewed changes

mteb/models/misc_models.py Outdated Show resolved Hide resolved

KennethEnevoldsen added 2 commits February 26, 2025 16:25

fix max tokens for kalm models #2162

e620f9b

remove eli 5

594ddb6

KennethEnevoldsen enabled auto-merge (squash) February 26, 2025 15:36

KennethEnevoldsen merged commit 6cc1822 into main Feb 26, 2025
9 checks passed

KennethEnevoldsen deleted the training-data-anno branch February 26, 2025 20:43

KennethEnevoldsen mentioned this pull request Feb 27, 2025

fix: Add more training data annotations #2178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Training data annotations #2173

Add Training data annotations #2173

KennethEnevoldsen commented Feb 26, 2025 •

edited

Loading

Uh oh!

Samoed left a comment

Uh oh!

Uh oh!

Samoed commented Feb 26, 2025

Uh oh!

Uh oh!

Uh oh!

Add Training data annotations #2173

Add Training data annotations #2173

Conversation

KennethEnevoldsen commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Samoed commented Feb 26, 2025

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Feb 26, 2025 •

edited

Loading