Skip to content

Conversation

Samoed
Copy link
Member

@Samoed Samoed commented Apr 4, 2025

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

gowitheflow-1998 and others added 30 commits March 23, 2025 03:34
…mplement CV-Bench (#2414)

* refactor CV-Bench

* reimplement CV Bench

* remove abstask/evaluator/tests for Any2TextMultipleChoice

* rerun descriptive stats
fix: Add option to remove leaderboard from leaderboard

fixes #2413

This only removed the benchmark from the leaderboard but keep it in MTEB.
Automatically generated by python-semantic-release
* Added VDR Multilingual Dataset

* address comments

* make lint

* Formated Dataset for retrieval

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* make lint

* corrected date

* fix dataset building

* move to image folder

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Automatically generated by python-semantic-release
* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs
…tering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter
* Fix typos; add chrono order

* Fix spacing
* Add model specific dependencies in pyproject.toml

* Update documentation
Automatically generated by python-semantic-release
…mplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation
* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments
* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
…mplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file
* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443
* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
* supress logging warnings

* remove loggers

* return blocks

* rename function

* fix gme models

* add server name

* update after merge

* fix ruff
@Samoed Samoed requested a review from isaac-chung April 4, 2025 12:33
@Samoed Samoed changed the base branch from main to v2.0.0 April 4, 2025 12:33
@Samoed Samoed merged commit 36cf009 into v2.0.0 Apr 4, 2025
9 checks passed
@Samoed Samoed deleted the merge_main branch April 4, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.