Skip to content

feat: UI Overhaul #2549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 17, 2025
Merged

feat: UI Overhaul #2549

merged 19 commits into from
Apr 17, 2025

Conversation

x-tabdeveloping
Copy link
Collaborator

I reworked the MTEB UI using the latest new features in Gradio.
Here are some of the main changes:

  • Benchmark selection in side bar with benchmark categories and colorful icons for easier navigation.
  • Faster model search with table filtering, full screen tables and copying the table to the clipboard in CSV format.
  • Custom settings are now hidden by default but more accessible by giving them more space.
  • New look including a new font and color scheme.

Here's a video of the UI after my changes:

mteb_ui_update-2025-04-15_19.30.13.mp4

@x-tabdeveloping
Copy link
Collaborator Author

I'm also eager to hear your opinion @ayush1298

Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks so cool 🤯

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very cool! Just got a few small, non-design-related questions.

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome changes!

),
]
"""
BENCHMARK_ENTRIES = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure that all benchmarks are included?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want that? I would imagine that once somebody adds a benchmark to MTEB, we would prefer if they sorted it into one of the categories in the leaderboard instead of landing in the Misc category.

I would love to hear opinions about this, but I'm not 100% certain that it's what we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want all benchmarks displayed by default.

We should probably remove the current fix for this (I think this solution is better).

https://github.com/embeddings-benchmark/mteb/blob/3ff993d35d856c4ddf74532b2e156cf6f08852ba/mteb/benchmarks/benchmarks.py#L1133C5-L1133C34

Copy link
Member

@Samoed Samoed Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these values to Benchmark object? I don't like that we have multiple definition of benchmark object
And we can change this definition like

(
        "Multilingual", [MTEB_Multilingual],
       "Language-specific Benchmarks", mteb.get_benchmarks(["MTEB(cmn, v1)", "MTEB(eng, v1)"], )
),
class Benchmark:
    ...
    display_name: str
    icon: str

MTEB_JPN = Benchmark(
       ...
        display_name ="Japanese"
        icon ="https://github.com/lipis/flag-icons/raw/260c91531be024944c6514130c5defb2ebb02b7d/flags/4x3/jp.svg"
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm a bit unsure about this. I have thought about it, and since the icon and the display name are actually more connected to the leaderboard and the UI than the benchmark itself, I would keep it as is for now

Copy link
Member

@Samoed Samoed Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with this we will have multiple configs for the same benchmarks in different places and I don't think we should do that

Copy link
Collaborator

@isaac-chung isaac-chung Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add an icon param for the benchmark (default None), and use the display_on_leaderboard param in the Benchmark class, we can programmatically create BENCHMARK_ENTRIES. That way, we won't have 2 definitions of the same objects. Easier to maintain.

I feel that regardless of the connection to the UI, keeping definitions unique is important for maintenance. We can totally do it in a separate PR tho.

@x-tabdeveloping
Copy link
Collaborator Author

Looks even cooler in white if you ask me:
image

@x-tabdeveloping x-tabdeveloping changed the title UI Overhaul feat: UI Overhaul Apr 16, 2025
@KennethEnevoldsen
Copy link
Contributor

A few notes:

  • Is benchmark selection folded out by default?
  • I think we can remove the reference to the old leaderboard
  • @Muennighoff should we remove the reference to MTEB Arena?
  • Delete “(soon)”
  • Is share this benchmark by default folded out? I would have it folded in
  • Do we want to display the radar chart by default? I think it is currently underused (i.e. move the description up and remove the plot selection?
    • Btw. I suspect removing our model search prevent selecting specific models for the plots?
  • What happens to MTEB(eng, v1)?

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments on the code, but the new look is great.

How is performance looking? Anything we need to worry about?

),
]
"""
BENCHMARK_ENTRIES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want all benchmarks displayed by default.

We should probably remove the current fix for this (I think this solution is better).

https://github.com/embeddings-benchmark/mteb/blob/3ff993d35d856c4ddf74532b2e156cf6f08852ba/mteb/benchmarks/benchmarks.py#L1133C5-L1133C34

@x-tabdeveloping
Copy link
Collaborator Author

@KennethEnevoldsen

  1. Yes, it is folded out by default, I think it's good if it's immediately obvious to our users how to change between them.
  2. I'm not sure I would remove the reference to the old leaderboard, some people will still be looking for it. And frankly, until we fix some annoying bugs, I don't think people will be completely satisfied with the new leaderboard.
  3. I think leaving share this benchmark folded out is okay, it takes up little space, and makes it more visually present. It would be a shame if people tried copying the URL from the browser bar and then discovered that it doesn't actually lead to the benchmark they want.
  4. The radar chart is not shown by default, you have to click on the tab to show it. I have no idea what you mean by moving the description up.
  5. MTEB(eng, v1) is listed in the language-specific benchmarks section as English, Legacy.

@x-tabdeveloping
Copy link
Collaborator Author

How is performance looking? Anything we need to worry about?

Performance is slightly better because I removed the gradient from the per-task table. A fix is being worked on by the Gradio team and will soon be released.
The language selection dropdown is still not fixed, I will do my best to help Abubakar and the Gradio team fix it.

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
@x-tabdeveloping
Copy link
Collaborator Author

Maybe we can move the reference to the legacy leaderboard to the bottom, that way it doesn't look like we are apologizing for having made a new one

@x-tabdeveloping
Copy link
Collaborator Author

I might be able to fix the language selection thing too, I'm looking into it right now

@KennethEnevoldsen
Copy link
Contributor

Yes, it is folded out by default, I think it's good if it's immediately obvious to our users how to change between them.

Agree

I'm not sure I would remove the reference to the old leaderboard, some people will still be looking for it. And frankly, until we fix some annoying bugs, I don't think people will be completely satisfied with the new leaderboard.

Maybe we can move the reference to the legacy leaderboard to the bottom, that way it doesn't look like we are apologizing for having made a new one

Buttom, it is then

I think leaving share this benchmark folded out is okay, it takes up little space, and makes it more visually present. It would be a shame if people tried copying the URL from the browser bar and then discovered that it doesn't actually lead to the benchmark they want.

It is only a slight preference, because it look a bit jarring so would fold it down (but not a showstopper). I don't think people will miss it

The radar chart is not shown by default, you have to click on the tab to show it. I have no idea what you mean by moving the description up.

Something like this:

Screenshot 2025-04-16 at 11 52 59

MTEB(eng, v1) is listed in the language-specific benchmarks section as English, Legacy.

Is it also linked in MTEB(eng, v2). Is it possible to have it not on the list and still linkable? otherwise let us put it in a legacy section and call it "English, v1"

@x-tabdeveloping
Copy link
Collaborator Author

@KennethEnevoldsen What chart would you put to the right? I personally think the layout is fine as is to be honest.
I can make the "share this benchmark" not visible by default, and we can also hide MTEB(eng, v1) and make it linkable.
My issue is that if we do that, then more people would reach for the old leaderboard space to see that benchmark. I would prefer keeping it visible but making it very obvious that we do not want people to use it.

@KennethEnevoldsen
Copy link
Contributor

My issue is that if we do that, then more people would reach for the old leaderboard space to see that benchmark. I would prefer keeping it visible but making it very obvious that we do not want people to use it.

Yep let us do that. We could:

put it in a "Legacy" section and call it "English, v1"

@x-tabdeveloping
Copy link
Collaborator Author

x-tabdeveloping commented Apr 16, 2025

I might be able to fix the language selection thing too, I'm looking into it right now

I can't. Not even a submit button fixes it, or removing callbacks altogether...
I will try to recreate the problem in an isolated context for the Gradio team.

@KennethEnevoldsen
Copy link
Contributor

Sounds like a fix that is not required for this PR to go through.

Should we fix the rest and get this merged?

@x-tabdeveloping
Copy link
Collaborator Author

Ye will do tmrw

Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a lot of welcome improvements on top of an already solid interface, nice work here!

@x-tabdeveloping x-tabdeveloping enabled auto-merge (squash) April 17, 2025 13:26
@x-tabdeveloping x-tabdeveloping merged commit 0ab947b into main Apr 17, 2025
8 checks passed
@x-tabdeveloping x-tabdeveloping deleted the ui_overhaul branch April 17, 2025 13:45
Samoed added a commit that referenced this pull request May 3, 2025
* SpeedTask add deprecated warning (#2493)

* Docs: Update README.md (#2494)

Update README.md

* fix transformers version for now (#2504)

* Fix typos (#2509)

* ci: refactor TaskMetadata eval langs test (#2501)

* refactor eval langs test

* function returns None

* add hard negaties tasks in _HISTORIC_DATASETS

* rename to ImageClustering folder (#2516)

rename folder

* Clean up trailing spaces citation (#2518)

* rename folder

* trailing spaces

* missed one

* [mieb] Memotion preprocessing code made more robust and readable (#2519)

* fix: validate lang code in ModelMeta (#2499)

* Update pyproject.toml (#2522)

* 1.36.38

Automatically generated by python-semantic-release

* Fix leaderboard version (#2524)

* fix gradio leaderboard run

* update docs

* Fix gte-multilingual-base embed_dim (#2526)

* [MIEB] Specify only the multilingual AggTask for MIEB-lite (#2539)

specify only the multilingual AggTask

* [mieb] fix hatefulmemes (#2531)

* fix hatefulmeme

* add to description and use polars instead

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Model conan (#2534)

* conan_models

* conan_models

* refactor code

* refactor code

---------

Co-authored-by: shyuli <shyuli@tencent.com>

* fix: Update mteb.get_tasks with an exclude_aggregate parameter to exclude aggregate tasks (#2536)

* Implement task.is_aggregate check

* Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed

* Update mteb.run with the new `task.is_aggregate` parameter

* Add tests

* Ran linter

* Changed logic to `exclude_aggregate`

* Updated from review comments

* Exclude aggregate by default false in get_tasks

* 1.36.39

Automatically generated by python-semantic-release

* docs: Add MIEB citation in benchmarks (#2544)

Add MIEB citation in benchmarks

* Add 2 new Vietnamese Retrieval Datasets (#2393)

* [ADD] 2 new Datasets

* [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO

* [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO

* Update tasks table

* fix: CacheWrapper per task (#2467)

* feat: CacheWrapper per task

* refactor logic

* update documentation

---------

Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>

* 1.36.40

Automatically generated by python-semantic-release

* misc: move MMTEB scripts and notebooks to separate repo (#2546)

move mmteb scripts and notebooks to separate repo

* fix: Update requirements in JinaWrapper (#2548)

fix: Update package requirements in JinaWrapper for einops and flash_attn

* 1.36.41

Automatically generated by python-semantic-release

* Docs: Add MIEB to README (#2550)

Add MIEB to README

* Add xlm_roberta_ua_distilled (#2547)

* defined model metadata for xlm_roberta_ua_distilled

* Update mteb/models/ua_sentence_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* included ua_sentence_models.py in overview.py

* applied linting, added missing fields in ModelMeta

* applied linting

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix me5 trainind data config to include xquad dataset (#2552)

* fix: me5 trainind data config to include xquad dataset

* Update mteb/models/e5_models.py

upddate: xquad key name

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: ME5_TRAINING_DATA format

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* feat: Added dataframe utilities to BenchmarkResults (#2542)

* fix: Added dataframe utilities to BenchmarkResults

- Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT?
- Added a tests for ModelResults and BenchmarksResults
- Added a few utility functions where needed
- Added docstring throughout ModelResults and BenchmarksResults
- Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then.

Prerequisite for #2454:

@ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right.

* refactor to to_dataframe and combine common dependencies

* ibid

* fix revision joining after discussion with @x-tabdeveloping

* remove strict=True for zip() as it is a >3.9 feature

* updated mock cache

* 1.37.0

Automatically generated by python-semantic-release

* fix e5_R_mistral_7b (#2490)

* fix e5_R_mistral_7b

* change wrapper

* address comments

* Added kwargs for pad_token

* correct lang format

* address comments

* add revision

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix unintentional working of filters on leaderboard (#2535)

* fix unintentional working of filters on leaderboard

* address comments

* make lint

* address comments

* rollback unnecessary changes

* feat: UI Overhaul (#2549)

* Bumped gradio version to latest

* Added new Gradio table functionality to leaderboard

* Removed search bar

* Changed color scheme in plot to match the table

* Added new benchmark selector in sidebar

* Changed not activated button type to secondary

* Short-circuited callbacks that are based on language selection

* Re-added column width calculation since it got messed up

* Commented out gradient for per-task table as it slowed things down substantially

* Styling and layout updates

* Adjusted comments according to reviews

* Converted all print statements to logger.debug

* Removed pydantic version fix

* Ran linting

* Remove commented out code

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Moved English,v1 to Legacy section

* Closed the benchmark sharing accordion by default

* Adjusted markdown blocks according to suggestions

* Ran linter

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.0

Automatically generated by python-semantic-release

* add USER2 (#2560)

* add user2

* add training code

* update prompts

* Fix leaderboard entry for BuiltBench (#2563)

Fix leaderboard entry for BuiltBench (#2562)

Co-authored-by: Mehrzad Shahin-Moghadam <mehr@Mehrzads-MacBook-Pro.local>

* fix: jasper models embeddings having nan values (#2481)

* 1.38.1

Automatically generated by python-semantic-release

* fix frida datasets (#2565)

* Add relle (#2564)

* Add relle
* defined model metadata for relle

* Add mteb/models/relle_models.py

* Update mteb/models/relle_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* lint after commit

run after "make lint"

* Add into model_modules

Add model into model_modules and lint check

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Backfill task metadata for metadata for GermanDPR and GermanQuAD (#2566)

* Add metadata for GermanDPR and GermanQuAD

* PR improvements

* Update tasks table

* Add  ModelMeta for CodeSearch-ModernBERT-Crow-Plus (#2570)

* Add files via upload

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update overview.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update mteb/models/shuu_model.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Docs: Improve MIEB docs (#2569)

* Add missing annotations (#2498)

* Update tasks table

* move icon & name to benchmark dataclass (#2573)

* Remove the comments from ImageEncoder (#2579)

* fix: Add Encodechka benchmark (#2561)

* add tasks

* add benchmark

* fix imports

* update stsb split

* Update tasks table

* 1.38.2

Automatically generated by python-semantic-release

* fix FlagEmbedding package name (#2588)

* fix codecarbon version (#2587)

* Add MIEB image only benchmark (#2590)

* add vision only bench

* add description

* correct zs task modalities

* specify tasks param

* Add image only MIEB benchmark to LB left panel (#2596)

* Update benchmarks.py

* make lint

* add to left side bar

* update Doubao-1.5-Embedding (#2575)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Add WebSSL models (#2604)

* add 2 web SSL dino models

* add models from collection and revisions

* update memory_usage_mb and embed dim

* use automodel instead

* fix mieb citation (#2606)

* 1.38.3

Automatically generated by python-semantic-release

* Update Doubao-1.5-Embedding (#2611)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* CI: update benchmark table (#2609)

* update benchmark table

* fix table

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update Doubao-1.5-Embedding revision (#2613)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* CI: fix table  (#2615)

* Update tasks & benchmarks tables

* fixes

* Update gradio version (#2558)

* Update gradio version

Closes #2557

* bump gradio

* fix: Removed missing dataset for MTEB(Multilingual) and bumped version

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* CI: fix infinitely committing issue (#2616)

* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix retrieval loader

* add descriptive stats

* Add ScandiSent dataset (#2620)

* add scandisent dataset

* add to init

* typo

* lint

* 1.38.4

Automatically generated by python-semantic-release

* Format all citations (#2614)

* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script

* fix citations

* update imports

* fix citations

* fix citations

* format citation

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: E. Tolga Ayan <33233561+tolgayan@users.noreply.github.com>
Co-authored-by: lllsy12138 <50816213+lllsy12138@users.noreply.github.com>
Co-authored-by: shyuli <shyuli@tencent.com>
Co-authored-by: Siddharth M. Bhatia <siddharth@sidmb.com>
Co-authored-by: Bao Loc Pham <67360122+BaoLocPham@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Flo <FlorianRottach@aol.com>
Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Olesksii Horchynskyi <121444758+panalexeu@users.noreply.github.com>
Co-authored-by: Pandaswag <110003154+torchtorchkimtorch@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Márton Kardos <power.up1163@gmail.com>
Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com>
Co-authored-by: Mehrzad Shahin-Moghadam <mehr@Mehrzads-MacBook-Pro.local>
Co-authored-by: Youngjoon Jang <82500463+yjoonjang@users.noreply.github.com>
Co-authored-by: 24September <puritysarah@naver.com>
Co-authored-by: Jan Karaś <90987511+KTFish@users.noreply.github.com>
Co-authored-by: Shuu <136542198+Shun0212@users.noreply.github.com>
Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com>
Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
isaac-chung added a commit that referenced this pull request May 3, 2025
* Update tasks table

* 1.36.26

Automatically generated by python-semantic-release

* Pass task name to all evaluators (#2389)

* pass task name to all tasks

* add test

* fix loader

* fix: renaming Zeroshot -> ZeroShot (#2395)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

* 1.36.27

Automatically generated by python-semantic-release

* fix: Update AmazonPolarityClassification license (#2402)

Update AmazonPolarityClassification.py

* fix b1ade name (#2403)

* 1.36.28

Automatically generated by python-semantic-release

* Minor style changes (#2396)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* rename 1

* rename 2

* format

* fixed error

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers  (#2302)

* Clustrec covid new dataset and task

* fix

* fix

* fix

* fix

* fix

* descriptive stats

* change all mentions of clustrec-covidp2p to clustrec-covid

* change ' to "

* Update tasks table

* fix: Major updates to docs + make mieb dep optional (#2397)

* fix: renaming Zeroshot -> ZeroShot

Adresses #2078

* fix: minor style changes

Adresses #2078

* fix: Major updates to documentation

This PR does the following:
- This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later.
- added minor code updates due to discovered inconsistencies in docs and code.
- Added the MMTEB citation where applicable
- makes the docs ready to move torchvision to an optional dependency

* Moved VISTA example

* rename 1

* rename 2

* format

* fixed error

* fix: make torchvision optional (#2399)

* fix: make torchvision optional

* format

* add docs

* minor fix

* remove transform from Any2TextMultipleChoiceEvaluator

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* move Running SentenceTransformer model with prompts to usage

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.36.29

Automatically generated by python-semantic-release

* remove Arabic_Triplet_Matryoshka_V2.py (#2405)

* Min torchvision>0.2.1 (#2410)

matching torch>1.0.0

* fix: Add validation to model_name in `ModelMeta` (#2404)

* add test for name validation

* upd docs

* upd cohere name

* fix tests

* fix name for average_word_embeddings_komninos

* fix name for average_word_embeddings_komninos

* fix reranker test

* fix reranker test

* 1.36.30

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414)

* refactor CV-Bench

* reimplement CV Bench

* remove abstask/evaluator/tests for Any2TextMultipleChoice

* rerun descriptive stats

* Update tasks table

* fix: Add option to remove benchmark from leaderboard (#2417)

fix: Add option to remove leaderboard from leaderboard

fixes #2413

This only removed the benchmark from the leaderboard but keep it in MTEB.

* 1.36.31

Automatically generated by python-semantic-release

* fix: Add VDR Multilingual Dataset (#2408)

* Added VDR Multilingual Dataset

* address comments

* make lint

* Formated Dataset for retrieval

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* make lint

* corrected date

* fix dataset building

* move to image folder

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update tasks table

* 1.36.32

Automatically generated by python-semantic-release

* HOTFIX: pin setuptools (#2423)

* pin setuptools

* pin setuptools

* pin setuptools in makefile

* try ci

* fix ci

* remove speed from installs

* add __init__.py Clustering > kor folder,  And   edit __init__.py in Clustering folder (#2422)

* add PatentFnBClustering.py

* do make lint and revise

* rollback Makefile

* Update mteb/tasks/Clustering/kor/PatentFnBClustering.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* klue_mrc_domain

* make lint

* klue_modified_clustering_dataset

* clustering & kor folder add __init.py

* clustering & kor folder add __init__.py

* task.py roll-back

* correct text_creation to sample_creation & delete form in MetaData

* correct task_subtype in TaskMetaData

* delete space

* edit metadata

* edit task_subtypes

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks table

* Update speed dependencies with new setuptools release (#2429)

* add richinfoai models (#2427)

* add richinfoai models

add richinfoai models

* format codes by linter

format codes by linter

* Added Memory Usage column on leaderboard (#2428)

* docs: typos; Standardize spacing; Chronological order (#2436)

* Fix typos; add chrono order

* Fix spacing

* fix: Add model specific dependencies in pyproject.toml (#2424)

* Add model specific dependencies in pyproject.toml

* Update documentation

* 1.36.33

Automatically generated by python-semantic-release

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442)

* MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats

* modify benchmark list

* fix citation

* Update tasks table

* Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445)

Fixes #2444

* Feat/searchmap preview (#2420)

* Added meta information about SearchMap_Preview model to the model_dir

* Added meta information about SearchMap_Preview model to the model_dir

* updated revision name

* Device loading and cuda cache cleaning step left out

* removed task instructions since it's not necessary

* changed sentence transformer loader to mteb default loader and passed instructions s model prompts

* Included searchmap to the models overview page

* Included searchmap to the models overview page

* added meta data information about where model was adpated from

* Update mteb/models/searchmap_models.py

* fix lint

* lint

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* Add Background Gradients in Summary and Task Table (#2392)

* Add Background Gradients in Summary and Task Table

* Remove warnings and add light green cmap

* Address comments

* Separate styling function

* address comments

* added comments

* add ops_moa_models (#2439)

* add ops_moa_models

* add custom implementations

* Simplify custom implementation and format the code

* support SentenceTransformers

* add training datasets

* Update mteb/models/ops_moa_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update training_datasets

---------

Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* leaderboard fix (#2456)

* ci: cache `~/.cache/huggingface` (#2464)

ci: cache ~/.cache/huggingface

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>

* [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468)

* reimplement ImageCoDe with ImageTextPairClassification

* add missing stats file

* Update tasks table

* fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443)

* feat: added pubmedbert model2vec models

* fix: attribute model_name

* fix: fixed commit hash for pubmed_bert model2vec models

* fix: changes requested in PR 2443

* fix: add nb_sbert model (#2339)

* add_nb_sbert_model

* Update nb_sbert.py

added n_parameters and release_date

* Update mteb/models/nb_sbert.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update nb_sbert.py

fix: make lint

* added nb_sbert to overview.py + ran make lint

* Update nb_sbert.py

Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* 1.36.34

Automatically generated by python-semantic-release

* suppress logging warnings on leaderboard (#2406)

* supress logging warnings

* remove loggers

* return blocks

* rename function

* fix gme models

* add server name

* update after merge

* fix ruff

* fix: E5 instruct now listed as sbert compatible (#2475)

Fixes #1442

* 1.36.35

Automatically generated by python-semantic-release

* [MIEB] rename VisionCentric to VisionCentricQA (#2479)

rename VisionCentric to VisionCentricQA

* ci: Run dataset loading only when pushing to main (#2480)

Update dataset_loading.yml

* fix table in tasks.md (#2483)

* Update tasks table

* fix: add prompt to NanoDBPedia (#2486)

* 1.36.36

Automatically generated by python-semantic-release

* Fix Task Lang Table (#2487)

* Fix Task Lang Table

* added tasks.md

* fix

* fix: Ignore datasets not available in tests (#2484)

* 1.36.37

Automatically generated by python-semantic-release

* [MIEB] align main metrics with leaderboard (#2489)

align main metrics with leaderboard

* typo in model name (#2491)

* SpeedTask add deprecated warning (#2493)

* Docs: Update README.md (#2494)

Update README.md

* fix transformers version for now (#2504)

* Fix typos (#2509)

* ci: refactor TaskMetadata eval langs test (#2501)

* refactor eval langs test

* function returns None

* add hard negaties tasks in _HISTORIC_DATASETS

* rename to ImageClustering folder (#2516)

rename folder

* Clean up trailing spaces citation (#2518)

* rename folder

* trailing spaces

* missed one

* [mieb] Memotion preprocessing code made more robust and readable (#2519)

* fix: validate lang code in ModelMeta (#2499)

* Update pyproject.toml (#2522)

* 1.36.38

Automatically generated by python-semantic-release

* Fix leaderboard version (#2524)

* fix gradio leaderboard run

* update docs

* Fix gte-multilingual-base embed_dim (#2526)

* [MIEB] Specify only the multilingual AggTask for MIEB-lite (#2539)

specify only the multilingual AggTask

* [mieb] fix hatefulmemes (#2531)

* fix hatefulmeme

* add to description and use polars instead

---------

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Model conan (#2534)

* conan_models

* conan_models

* refactor code

* refactor code

---------

Co-authored-by: shyuli <shyuli@tencent.com>

* fix: Update mteb.get_tasks with an exclude_aggregate parameter to exclude aggregate tasks (#2536)

* Implement task.is_aggregate check

* Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed

* Update mteb.run with the new `task.is_aggregate` parameter

* Add tests

* Ran linter

* Changed logic to `exclude_aggregate`

* Updated from review comments

* Exclude aggregate by default false in get_tasks

* 1.36.39

Automatically generated by python-semantic-release

* docs: Add MIEB citation in benchmarks (#2544)

Add MIEB citation in benchmarks

* Add 2 new Vietnamese Retrieval Datasets (#2393)

* [ADD] 2 new Datasets

* [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO

* [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO

* Update tasks table

* fix: CacheWrapper per task (#2467)

* feat: CacheWrapper per task

* refactor logic

* update documentation

---------

Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>

* 1.36.40

Automatically generated by python-semantic-release

* misc: move MMTEB scripts and notebooks to separate repo (#2546)

move mmteb scripts and notebooks to separate repo

* fix: Update requirements in JinaWrapper (#2548)

fix: Update package requirements in JinaWrapper for einops and flash_attn

* 1.36.41

Automatically generated by python-semantic-release

* Docs: Add MIEB to README (#2550)

Add MIEB to README

* Add xlm_roberta_ua_distilled (#2547)

* defined model metadata for xlm_roberta_ua_distilled

* Update mteb/models/ua_sentence_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* included ua_sentence_models.py in overview.py

* applied linting, added missing fields in ModelMeta

* applied linting

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix me5 trainind data config to include xquad dataset (#2552)

* fix: me5 trainind data config to include xquad dataset

* Update mteb/models/e5_models.py

upddate: xquad key name

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: ME5_TRAINING_DATA format

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* feat: Added dataframe utilities to BenchmarkResults (#2542)

* fix: Added dataframe utilities to BenchmarkResults

- Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT?
- Added a tests for ModelResults and BenchmarksResults
- Added a few utility functions where needed
- Added docstring throughout ModelResults and BenchmarksResults
- Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then.

Prerequisite for #2454:

@ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right.

* refactor to to_dataframe and combine common dependencies

* ibid

* fix revision joining after discussion with @x-tabdeveloping

* remove strict=True for zip() as it is a >3.9 feature

* updated mock cache

* 1.37.0

Automatically generated by python-semantic-release

* fix e5_R_mistral_7b (#2490)

* fix e5_R_mistral_7b

* change wrapper

* address comments

* Added kwargs for pad_token

* correct lang format

* address comments

* add revision

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix unintentional working of filters on leaderboard (#2535)

* fix unintentional working of filters on leaderboard

* address comments

* make lint

* address comments

* rollback unnecessary changes

* feat: UI Overhaul (#2549)

* Bumped gradio version to latest

* Added new Gradio table functionality to leaderboard

* Removed search bar

* Changed color scheme in plot to match the table

* Added new benchmark selector in sidebar

* Changed not activated button type to secondary

* Short-circuited callbacks that are based on language selection

* Re-added column width calculation since it got messed up

* Commented out gradient for per-task table as it slowed things down substantially

* Styling and layout updates

* Adjusted comments according to reviews

* Converted all print statements to logger.debug

* Removed pydantic version fix

* Ran linting

* Remove commented out code

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Moved English,v1 to Legacy section

* Closed the benchmark sharing accordion by default

* Adjusted markdown blocks according to suggestions

* Ran linter

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* 1.38.0

Automatically generated by python-semantic-release

* add USER2 (#2560)

* add user2

* add training code

* update prompts

* Fix leaderboard entry for BuiltBench (#2563)

Fix leaderboard entry for BuiltBench (#2562)

Co-authored-by: Mehrzad Shahin-Moghadam <mehr@Mehrzads-MacBook-Pro.local>

* fix: jasper models embeddings having nan values (#2481)

* 1.38.1

Automatically generated by python-semantic-release

* fix frida datasets (#2565)

* Add relle (#2564)

* Add relle
* defined model metadata for relle

* Add mteb/models/relle_models.py

* Update mteb/models/relle_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* lint after commit

run after "make lint"

* Add into model_modules

Add model into model_modules and lint check

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Backfill task metadata for metadata for GermanDPR and GermanQuAD (#2566)

* Add metadata for GermanDPR and GermanQuAD

* PR improvements

* Update tasks table

* Add  ModelMeta for CodeSearch-ModernBERT-Crow-Plus (#2570)

* Add files via upload

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update overview.py

* Update shuu_model.py

* Update shuu_model.py

* Update shuu_model.py

* Update mteb/models/shuu_model.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Docs: Improve MIEB docs (#2569)

* Add missing annotations (#2498)

* Update tasks table

* move icon & name to benchmark dataclass (#2573)

* Remove the comments from ImageEncoder (#2579)

* fix: Add Encodechka benchmark (#2561)

* add tasks

* add benchmark

* fix imports

* update stsb split

* Update tasks table

* 1.38.2

Automatically generated by python-semantic-release

* fix FlagEmbedding package name (#2588)

* fix codecarbon version (#2587)

* Add MIEB image only benchmark (#2590)

* add vision only bench

* add description

* correct zs task modalities

* specify tasks param

* Add image only MIEB benchmark to LB left panel (#2596)

* Update benchmarks.py

* make lint

* add to left side bar

* update Doubao-1.5-Embedding (#2575)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* fix: Add WebSSL models (#2604)

* add 2 web SSL dino models

* add models from collection and revisions

* update memory_usage_mb and embed dim

* use automodel instead

* fix mieb citation (#2606)

* 1.38.3

Automatically generated by python-semantic-release

* Update Doubao-1.5-Embedding (#2611)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* CI: update benchmark table (#2609)

* update benchmark table

* fix table

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update Doubao-1.5-Embedding revision (#2613)

* update seed-embedding

* update seed models

* fix linting and tiktoken problem

* fix tiktoken bug

* fix lint

* update name

* Update mteb/models/seed_models.py

adopt suggestion

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* update logging

* update lint

* update link

* update revision

---------

Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* CI: fix table  (#2615)

* Update tasks & benchmarks tables

* Update gradio version (#2558)

* Update gradio version

Closes #2557

* bump gradio

* fix: Removed missing dataset for MTEB(Multilingual) and bumped version

We should probably just have done this earlier to ensure that the multilingual benchamrk is runable.

* CI: fix infinitely committing issue (#2616)

* fix token

* try to trigger

* add token

* test ci

* Update tasks & benchmarks tables

* Update tasks & benchmarks tables

* remove test lines

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ScandiSent dataset (#2620)

* add scandisent dataset

* add to init

* typo

* lint

* 1.38.4

Automatically generated by python-semantic-release

* Format all citations (#2614)

* Fix errors in bibtex_citation

* Format all bibtex_citation fields

* format benchmarks

* fix format

* Fix tests

* add formatting script

* fix citations (#2628)

* Add Talemaader pair classification task (#2621)

Add talemaader pair classification task

* fix citations

* fix citations

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com>
Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com>
Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in>
Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com>
Co-authored-by: richinfo-ai <richinfoai@163.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com>
Co-authored-by: kunka.xgw <kunka.xgw@taobao.com>
Co-authored-by: Sam Heymann <40773225+sam-hey@users.noreply.github.com>
Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com>
Co-authored-by: theatollersrud <thea.tollersrud@nb.no>
Co-authored-by: hongst <76415500+seongtaehong@users.noreply.github.com>
Co-authored-by: E. Tolga Ayan <33233561+tolgayan@users.noreply.github.com>
Co-authored-by: lllsy12138 <50816213+lllsy12138@users.noreply.github.com>
Co-authored-by: shyuli <shyuli@tencent.com>
Co-authored-by: Siddharth M. Bhatia <siddharth@sidmb.com>
Co-authored-by: Bao Loc Pham <67360122+BaoLocPham@users.noreply.github.com>
Co-authored-by: Flo <FlorianRottach@aol.com>
Co-authored-by: Florian Rottach <florianrottach@boehringer-ingelheim.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Olesksii Horchynskyi <121444758+panalexeu@users.noreply.github.com>
Co-authored-by: Pandaswag <110003154+torchtorchkimtorch@users.noreply.github.com>
Co-authored-by: Márton Kardos <power.up1163@gmail.com>
Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com>
Co-authored-by: Mehrzad Shahin-Moghadam <mehr@Mehrzads-MacBook-Pro.local>
Co-authored-by: Youngjoon Jang <82500463+yjoonjang@users.noreply.github.com>
Co-authored-by: 24September <puritysarah@naver.com>
Co-authored-by: Jan Karaś <90987511+KTFish@users.noreply.github.com>
Co-authored-by: Shuu <136542198+Shun0212@users.noreply.github.com>
Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com>
Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants