Skip to content

Conversation

isaac-chung
Copy link
Collaborator

@isaac-chung isaac-chung commented Apr 1, 2025

Periodically merge main into maeb. This covers ~ 6 weeks of main progress.

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

CC @KennethEnevoldsen @switchpiggy

isaac-chung and others added 30 commits February 13, 2025 23:35
* add ImageClassificationDescriptiveStatistics

* add MNIST descriptive stats

* use tuples instead

* add label count and update docstrings

* update MNIST example
* fix: Add column descriptions to leaderboard

* removed existing overlap
#2041)

* fix: Add BRIGHT Long

Fixes #1978

* fix: Add BRIGHT(long)

* fix bug in task results

* updated bright

* updated tests for TaskResults
Automatically generated by python-semantic-release
* add image clustering descirptive stats and run
* finish off last one
* remove script
Automatically generated by python-semantic-release
* add gigaembeddings

* use jasper

* fix name

* create sentence_transformer instruct wrapper

* apply instruction template

* fix jasper

* update meta
…plementation (#2059)

* add image clustering descirptive stats and run

* finish off last one

* remove script

* add ImageMultilabelClassificationDescriptiveStatistics

* add VOC2007

* add zeroshot and mnist example
* add visualsts stats

* add last dataset
* fix: Added gte models

* fix: Add mixbai models (#1540)

for #1515
* Updated ClimateFEVER dataset with new version

* Adds Fill in the empty metadata.

* Updates the date tuple

* Update class name

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update domains

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task_subtypes

* Update annotations_creators for the first version

* Update date

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task subtypes

* Update path

* Update description

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Mina Parham <minaparham@Keatext.local>
* change reference revisions to align with paper

* Update author list

* Added code for main results table

* updated minor changes

* added external as a "no_revision_available" case

* revert unintended changes

* format
Automatically generated by python-semantic-release
#1911)

* adding clustering tasks (built-bench-clustering S2S & P2P)

* updated built-bench-clustering tasks

* Updated BuiltBenchClustering tasks

* Added "Engineering" as new domain to TaskMetadata.py
* Updated tasks table in docs
* Updated task metadata for BuiltBenchClustering S2S and P2P

* updated metadata for clustering tasks

* Add/update BuiltBench tasks

- Add BuiltBenchRetrieval task
- Add BuiltBenchReranking task
- Update metadata for BuiltBenchClusterinP2P
- Update metadata for BuiltBenchClusterinS2S

* update BuiltBench benchmark

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Fix formatting via ruff

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* update model names to adjust for adding to results repo

* update model meta script
* add most image classification descr stats

* revert changes to encoder

* add stats

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* fix: rerun tests that fail - Networking

* update tests to use tmp_path

* set versions for dev dependencies

* add pytest options to pyproject.toml

* add rerun json.decoder.JSONDecodeError

* remove JSONDecodeError from pyproject.toml

* add huggingface_hub.errors.HfHubHTTPError

* add huggingface_hub.errors.LocalEntryNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044

* FileNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029

* add doc to pytest rerun

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
* fix: generate metadata

* use logging not print for script

* lint

* add iso639 to dev pyproject

* fix import

* add memory_usage_mb

* set version for iso639

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
add missing training datasets
Automatically generated by python-semantic-release
Samoed and others added 24 commits March 24, 2025 17:39
* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs
…tering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter
* Fix typos; add chrono order

* Fix spacing
* Add model specific dependencies in pyproject.toml

* Update documentation
Automatically generated by python-semantic-release
…mplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation
* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments
* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
…mplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file
* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443
* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
@isaac-chung isaac-chung changed the title [MAEB] merge maeb main [MAEB] merge main -> maeb Apr 1, 2025
@isaac-chung isaac-chung requested a review from Samoed April 1, 2025 19:29
Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@isaac-chung isaac-chung merged commit f3a0403 into maeb Apr 1, 2025
8 of 10 checks passed
@isaac-chung isaac-chung deleted the isaac/merge-maeb-main branch April 1, 2025 20:19
@isaac-chung isaac-chung mentioned this pull request Apr 4, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.