Merge main maeb 07 10 #2894

Samoed · 2025-07-10T13:10:43Z

Merge main branch fix fixed datasets version

* add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff

* fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size

Automatically generated by python-semantic-release

update seed1.6 model training data info

Automatically generated by python-semantic-release

* add model meta * linting * fix: add check for code lora * fix: apply review comments

* fix prompt validation * fix task name split correctly * add docstring for test

Automatically generated by python-semantic-release

* Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

Bump gradio

* nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix sbert `v5` * add comment

* add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com>

comment kalm model

* Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets

* add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Added DadoEvalCoarseClassification * Removed unnecessary columns from DadoEvalCoarseClassification * Added EmitClassification task * added SardiStanceClassification task * Added GeoLingItClassification task * Added DisCoTexPairClassification tasks * Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits * changed import in DisCoTexPairClassification * removed GeoLingItClassification dataset * fixed citation formatting, missing metadata parameters and lint formatting * - Added XGlueWRPReranking task - Added missing __init__.py files * fixed metadata in XGlueWRPReranking * Added MKQARetrieval task * fixed type in XGlueWRPReranking * changed MKQARetrieval from cross-lingual to monolingual * formatted MKQARetrieval file * removed unused const --------- Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>

fix datasets version

Automatically generated by python-semantic-release

# Conflicts: # README.md # docs/adding_a_model.md # docs/mieb/readme.md # mteb/abstasks/Audio/AbsTaskAudioZeroshotClassification.py # mteb/abstasks/TaskMetadata.py # mteb/benchmarks/benchmarks.py # mteb/custom_validators.py # mteb/descriptive_stats/BitextMining/WebFAQBitextMiningQAs.json # mteb/descriptive_stats/BitextMining/WebFAQBitextMiningQuestions.json # mteb/descriptive_stats/Image/Any2AnyRetrieval/ROxfordEasyI2IRetrieval.json # mteb/descriptive_stats/Image/Any2AnyRetrieval/ROxfordHardI2IRetrieval.json # mteb/descriptive_stats/Image/Any2AnyRetrieval/ROxfordMediumI2IRetrieval.json # mteb/descriptive_stats/Image/Any2AnyRetrieval/RParisEasyI2IRetrieval.json # mteb/descriptive_stats/Image/Any2AnyRetrieval/RParisHardI2IRetrieval.json # mteb/descriptive_stats/Image/Any2AnyRetrieval/RParisMediumI2IRetrieval.json # mteb/models/overview.py # pyproject.toml # scripts/mmteb_create_author_list.ipynb # scripts/task_selection/europe_tasks.csv # scripts/task_selection/indic_tasks.csv # scripts/task_selection/mult_tasks.csv # scripts/task_selection/task_selection_eng_lite.ipynb # scripts/task_selection/task_selection_eu.ipynb # scripts/task_selection/task_selection_example.ipynb # scripts/task_selection/task_selection_indic.ipynb # scripts/task_selection/task_selection_mult.ipynb # tests/test_benchmark/mock_models.py

github-actions bot added 30 commits May 1, 2025 17:07

Update tasks & benchmarks tables

37f86e2

Update tasks & benchmarks tables

75db6fb

Update tasks & benchmarks tables

ad232aa

Update tasks & benchmarks tables

61c611f

Update tasks & benchmarks tables

f9b747f

Update tasks & benchmarks tables

8914793

Update tasks & benchmarks tables

0665cd2

Update tasks & benchmarks tables

5b34e6a

Update tasks & benchmarks tables

3703f11

Update tasks & benchmarks tables

c54e88f

Update tasks & benchmarks tables

72eea70

Update tasks & benchmarks tables

edb9c78

Update tasks & benchmarks tables

edbf218

Update tasks & benchmarks tables

296c1ee

Update tasks & benchmarks tables

f17902a

Update tasks & benchmarks tables

f4d72bc

Update tasks & benchmarks tables

607eb6f

Update tasks & benchmarks tables

bd9bb89

Update tasks & benchmarks tables

0eec584

Update tasks & benchmarks tables

90cd48a

Update tasks & benchmarks tables

046ecf0

Update tasks & benchmarks tables

2942557

Update tasks & benchmarks tables

9e5ce29

Update tasks & benchmarks tables

3fd7bec

Update tasks & benchmarks tables

b65f0ec

Update tasks & benchmarks tables

54b863e

Update tasks & benchmarks tables

cd83936

Update tasks & benchmarks tables

cd4670c

Update tasks & benchmarks tables

2c2ed55

Update tasks & benchmarks tables

86069c7

ekolodin and others added 26 commits June 20, 2025 19:51

model: Add custom instructions for GigaEmbeddings (#2836)

d7ff1ab

* add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

model: add Seed-1.6-embedding model (#2841)

8851bf0

* add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff

1.38.31

642898f

Automatically generated by python-semantic-release

fix: update training dataset info of Seed-1.6-embedding model (#2857)

a8214e2

update seed1.6 model training data info

1.38.32

82844eb

Automatically generated by python-semantic-release

add jinav4 model meta (#2858)

f1d560a

* add model meta * linting * fix: add check for code lora * fix: apply review comments

fix: prompt validation for tasks with - (#2846)

430357c

* fix prompt validation * fix task name split correctly * add docstring for test

1.38.33

9fed3e5

Automatically generated by python-semantic-release

Bump gradio to fix leaderboard sorting (#2866)

a4388c2

Bump gradio

model: Adding nvidia/llama-nemoretriever-colembed models (#2861)

4ff1413

* nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

rename seed-1.6-embedding to seed1.6-embedding (#2870)

f27648b

fix tests to be compatible with SentenceTransformers v5 (#2875)

f346a37

* fix sbert `v5` * add comment

model: add listconranker modelmeta (#2874)

5846f56

* add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

model: add kalm_models ModelMeta (new PR) (#2853)

b67bd04

* feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com>

Comment kalm model (#2877)

a3ca95c

comment kalm model

Update tasks & benchmarks tables

5be02c1

Update tasks & benchmarks tables

5303fec

fix: pin datasets version (#2892)

00c95cf

fix datasets version

1.38.34

cfa27d7

Automatically generated by python-semantic-release

merge main

f969e36

Samoed requested a review from isaac-chung July 10, 2025 13:10

isaac-chung approved these changes Jul 10, 2025

View reviewed changes

Samoed merged commit c7b8542 into maeb Jul 10, 2025
9 checks passed

Samoed deleted the merge_main_maeb_07_10 branch July 10, 2025 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge main maeb 07 10 #2894

Merge main maeb 07 10 #2894

Uh oh!

Samoed commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

Merge main maeb 07 10 #2894

Merge main maeb 07 10 #2894

Uh oh!

Conversation

Samoed commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!