Skip to content

[MAEB] main merge #2341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 171 commits into from
Mar 13, 2025
Merged

[MAEB] main merge #2341

merged 171 commits into from
Mar 13, 2025

Conversation

isaac-chung
Copy link
Collaborator

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

isaac-chung and others added 30 commits February 13, 2025 23:35
* add ImageClassificationDescriptiveStatistics

* add MNIST descriptive stats

* use tuples instead

* add label count and update docstrings

* update MNIST example
* fix: Add column descriptions to leaderboard

* removed existing overlap
#2041)

* fix: Add BRIGHT Long

Fixes #1978

* fix: Add BRIGHT(long)

* fix bug in task results

* updated bright

* updated tests for TaskResults
Automatically generated by python-semantic-release
* add image clustering descirptive stats and run
* finish off last one
* remove script
Automatically generated by python-semantic-release
* add gigaembeddings

* use jasper

* fix name

* create sentence_transformer instruct wrapper

* apply instruction template

* fix jasper

* update meta
…plementation (#2059)

* add image clustering descirptive stats and run

* finish off last one

* remove script

* add ImageMultilabelClassificationDescriptiveStatistics

* add VOC2007

* add zeroshot and mnist example
* add visualsts stats

* add last dataset
* fix: Added gte models

* fix: Add mixbai models (#1540)

for #1515
* Updated ClimateFEVER dataset with new version

* Adds Fill in the empty metadata.

* Updates the date tuple

* Update class name

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update domains

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task_subtypes

* Update annotations_creators for the first version

* Update date

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Update task subtypes

* Update path

* Update description

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Mina Parham <minaparham@Keatext.local>
* change reference revisions to align with paper

* Update author list

* Added code for main results table

* updated minor changes

* added external as a "no_revision_available" case

* revert unintended changes

* format
Automatically generated by python-semantic-release
#1911)

* adding clustering tasks (built-bench-clustering S2S & P2P)

* updated built-bench-clustering tasks

* Updated BuiltBenchClustering tasks

* Added "Engineering" as new domain to TaskMetadata.py
* Updated tasks table in docs
* Updated task metadata for BuiltBenchClustering S2S and P2P

* updated metadata for clustering tasks

* Add/update BuiltBench tasks

- Add BuiltBenchRetrieval task
- Add BuiltBenchReranking task
- Update metadata for BuiltBenchClusterinP2P
- Update metadata for BuiltBenchClusterinS2S

* update BuiltBench benchmark

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/benchmarks/benchmarks.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Fix formatting via ruff

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* update model names to adjust for adding to results repo

* update model meta script
* add most image classification descr stats

* revert changes to encoder

* add stats

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* fix: rerun tests that fail - Networking

* update tests to use tmp_path

* set versions for dev dependencies

* add pytest options to pyproject.toml

* add rerun json.decoder.JSONDecodeError

* remove JSONDecodeError from pyproject.toml

* add huggingface_hub.errors.HfHubHTTPError

* add huggingface_hub.errors.LocalEntryNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044

* FileNotFoundError
https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029

* add doc to pytest rerun

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
* fix: generate metadata

* use logging not print for script

* lint

* add iso639 to dev pyproject

* fix import

* add memory_usage_mb

* set version for iso639

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
add missing training datasets
Automatically generated by python-semantic-release
github-actions and others added 24 commits March 9, 2025 17:01
Automatically generated by python-semantic-release
* make dev life nicer - pre-commit hooks

* add pre-commit to install

* update precommit

* update ruff pre-commit

* lint

* lint

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
* fix: Fix bug in voyage implementation

"passage" is not a valid input for the voyage API. Remapped to "document".

* Update mteb/models/voyage_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
* Added VDR Model

* change custom wrapper to SentenceTransformer Wrapper

* remove kwargs and add TODO for Image Modality

* Update mteb/models/vdr_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
These errors where discovered when trying to install the package using `uv`.

We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies.
Automatically generated by python-semantic-release
* fix: Resolve conflicting dependencies

These errors where discovered when trying to install the package using `uv`.

We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies.

* fix: Remove syntax warnings occuring in python 3.12

```
Python 3.12.0 (main, Oct  2 2023, 20:56:14) [Clang 16.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mteb # no syntax warnings
>>>
```
Automatically generated by python-semantic-release
* fix: add annotation models for stella zh

Additionally fixed a few annotation errors

* format

* Update mteb/models/stella_models.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Automatically generated by python-semantic-release
* Add rubert-mini-frida model meta

* Add BERTA model meta
Automatically generated by python-semantic-release
* Add WebFAQ bitext mining tasks

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

* Lower number of language pairs in WebFAQBitextMining

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>

---------

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Automatically generated by python-semantic-release
@KennethEnevoldsen
Copy link
Contributor

Great! Thanks

Impossible to review with the diff, though. Should we just merge it in?

@KennethEnevoldsen
Copy link
Contributor

Once this is merged in, I can try to see if we can do a merge to the main without introducing bugs

@isaac-chung
Copy link
Collaborator Author

Yeah, really just trying to make the tests pass. They caught a few things already from the merge.

@isaac-chung isaac-chung merged commit cd07f24 into maeb Mar 13, 2025
9 checks passed
@isaac-chung isaac-chung deleted the maeb-main-merge-20250312 branch March 13, 2025 10:07
@Samoed Samoed mentioned this pull request Mar 13, 2025
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.