Skip to content

Merge main maeb 07 10 #2894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 565 commits into from
Jul 10, 2025
Merged

Merge main maeb 07 10 #2894

merged 565 commits into from
Jul 10, 2025

Conversation

Samoed
Copy link
Member

@Samoed Samoed commented Jul 10, 2025

Merge main branch fix fixed datasets version

ekolodin and others added 26 commits June 20, 2025 19:51
* add custom instructions

* fixed

* lint

* fix last instruction

---------

Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* add Seed-1.6-embedding model

* Update seed_1_6_embedding_models.py

* update model meta info

* support image encoder interface

* error fix

* fix: format seed_1_6_embedding_models.py with Ruff
* fix: Update model selection for the leaderboard

fixes #2834

This removed the lower bound selection, but generally I don't think people should care about the models being too small.

* fix 1M --> 1B

* format

* rename model_size -> max_model_size
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
* add model meta

* linting

* fix: add check for code lora

* fix: apply review comments
* fix prompt validation

* fix task name split correctly

* add docstring for test
Automatically generated by python-semantic-release
* Adding Hinvec Model's Meta data.

* Adding hinvec_model.py

* Update mteb/models/hinvec_models.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* formated code with Black and lint with Ruff

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
* nvidia_llama_nemoretriever_colembed

* correct 3b reference

* lint fix

* add training data and license for nvidia/llama_nemoretriever_colembed

* lint

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* add listconranker modelmeta

* fix bugs

* use linter

* lint

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* feat: add KaLM_Embedding_X_0605 in kalm_models

* Update kalm_models.py for lint format

---------

Co-authored-by: xinshuohu <xinshuohu@tencent.com>
comment kalm model
* Add JaCWIR and JQaRA for reranking

* Fix ANLP Journal datasets

* Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval

* tackle test cases

* Remove _evaluate_subset usage

* Separate v1 and v2

* Update info for NLP Journal datasets
* add tooka v2s

* add mcinext models

* update mcinext.py

* Apply PR review suggestions

* Update mteb/models/mcinext_models.py

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* Added DadoEvalCoarseClassification

* Removed unnecessary columns from DadoEvalCoarseClassification

* Added EmitClassification task

* added SardiStanceClassification task

* Added GeoLingItClassification task

* Added DisCoTexPairClassification tasks

* Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits

* changed import in DisCoTexPairClassification

* removed GeoLingItClassification dataset

* fixed citation formatting, missing metadata parameters and lint formatting

* - Added XGlueWRPReranking task
- Added missing __init__.py files

* fixed metadata in XGlueWRPReranking

* Added MKQARetrieval task

* fixed type in XGlueWRPReranking

* changed MKQARetrieval from  cross-lingual to monolingual

* formatted MKQARetrieval file

* removed unused const

---------

Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>
Automatically generated by python-semantic-release
# Conflicts:
#	README.md
#	docs/adding_a_model.md
#	docs/mieb/readme.md
#	mteb/abstasks/Audio/AbsTaskAudioZeroshotClassification.py
#	mteb/abstasks/TaskMetadata.py
#	mteb/benchmarks/benchmarks.py
#	mteb/custom_validators.py
#	mteb/descriptive_stats/BitextMining/WebFAQBitextMiningQAs.json
#	mteb/descriptive_stats/BitextMining/WebFAQBitextMiningQuestions.json
#	mteb/descriptive_stats/Image/Any2AnyRetrieval/ROxfordEasyI2IRetrieval.json
#	mteb/descriptive_stats/Image/Any2AnyRetrieval/ROxfordHardI2IRetrieval.json
#	mteb/descriptive_stats/Image/Any2AnyRetrieval/ROxfordMediumI2IRetrieval.json
#	mteb/descriptive_stats/Image/Any2AnyRetrieval/RParisEasyI2IRetrieval.json
#	mteb/descriptive_stats/Image/Any2AnyRetrieval/RParisHardI2IRetrieval.json
#	mteb/descriptive_stats/Image/Any2AnyRetrieval/RParisMediumI2IRetrieval.json
#	mteb/models/overview.py
#	pyproject.toml
#	scripts/mmteb_create_author_list.ipynb
#	scripts/task_selection/europe_tasks.csv
#	scripts/task_selection/indic_tasks.csv
#	scripts/task_selection/mult_tasks.csv
#	scripts/task_selection/task_selection_eng_lite.ipynb
#	scripts/task_selection/task_selection_eu.ipynb
#	scripts/task_selection/task_selection_example.ipynb
#	scripts/task_selection/task_selection_indic.ipynb
#	scripts/task_selection/task_selection_mult.ipynb
#	tests/test_benchmark/mock_models.py
@Samoed Samoed requested a review from isaac-chung July 10, 2025 13:10
@Samoed Samoed merged commit c7b8542 into maeb Jul 10, 2025
9 checks passed
@Samoed Samoed deleted the merge_main_maeb_07_10 branch July 10, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.