Ara and ben classification dataset cleaning #2632

AlexeyVatolin · 2025-05-02T23:17:24Z

I applied only the necessary cleaning functions for each task.

There are still some data issues in the classification tasks; if my approach is acceptable, I will apply it to the other tasks as well.

Scores comparison

As expected, scores slightly drop after my changes for most tasks.
multilingual-e5-small

	task_name	main_score_old	main_score_new	delta_percent
0	AJGT	0.7455	0.727835	-2.37
2	BengaliDocumentClassification	0.506934	0.496289	-2.1
4	BengaliHateSpeechClassification	0.492376	0.497559	1.05
6	BengaliSentimentAnalysis	0.831888	0.805349	-3.19
8	CSFDCZMovieReviewSentimentClassification	0.275586	0.277197	0.58
11	CzechProductReviewSentimentClassification	0.507227	0.504395	-0.56
12	HotelReviewSentimentClassification	0.494775	0.492434	-0.47
14	OnlineStoreReviewSentimentClassification	0.362256	0.311644	-13.97
16	RestaurantReviewSentimentClassification	0.635986	0.631377	-0.72
18	TweetEmotionClassification	0.511279	0.491574	-3.85
20	TweetSarcasmClassification	0.609289	0.564206	-7.4

paraphrase-multilingual-MiniLM-L12-v2

	task_name	main_score_old	main_score_new	delta_percent
1	AJGT	0.699278	0.693557	-0.82
3	BengaliDocumentClassification	0.335498	0.300439	-10.45
5	BengaliHateSpeechClassification	0.340113	0.323974	-4.75
7	BengaliSentimentAnalysis	0.606135	0.562747	-7.16
9	CSFDCZMovieReviewSentimentClassification	0.268652	0.256982	-4.34
10	CzechProductReviewSentimentClassification	0.513428	0.493799	-3.82
13	HotelReviewSentimentClassification	0.42041	0.421355	0.22
15	OnlineStoreReviewSentimentClassification	0.32168	0.282877	-12.06
17	RestaurantReviewSentimentClassification	0.604102	0.63818	5.64
19	TweetEmotionClassification	0.385498	0.368708	-4.36
21	TweetSarcasmClassification	0.540664	0.467872	-13.46

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Samoed · 2025-05-03T09:00:31Z

mteb/abstasks/data_cleaning.py

I think you should create v2 versions of these tasks to not change existing results

Yeah, so to ensure backward compatibility of the benchmarks, we need to keep the original version and create a v2 of the task.

"STS22.v2" is a good example of this (though I probably wouldn't duplicate the metadata).

I would probably update the description of the task, though. Something like:

"This task version fixes discovered [quality issues](link to PR) by removing empty strings (XX%), duplicates (XX%) and leakage in the test set from the training set (XX%)."

If we downsample, I would add:

"We additionally downsample the dataset to N samples."

If we add a test set then I would reformulate to:

"This task version fixes discovered [quality issues](link to PR) by removing empty strings (XX%), duplicates (XX%), we additionally also introduce a test set from the original dataset consisting of N samples."

Do you mean that I should create a new issue describing the changes for this pull request? I thought that issue #1049 was like epic where I would link all tasks related to data cleaning.

KennethEnevoldsen

This looks good!

I suspect it might be ideal to move this to v2 due to the dataset upload (see below). @Samoed what do you think?

Would it be relevant to remake into something like:

task = mteb.get_task(...)

from mteb._quality_control import deduplicate, remove_leakage
# the _ is to denote it isn't intended public functions

task = deduplicate(task)
task = remove_leakage(task)
task.metadata.description += NEW_DESC # see below
task.metadata.name += ".v2" # has to be smarter in case it is v2 it of course have to be v3
task.push_to_hub()

Once the PR is done we of course also need to add a test to ensure that new dataset additions doesn't have these problems - we can test this from the datasets statistics (computed for all in v2)

mteb/tasks/Classification/ara/AJGT.py

KennethEnevoldsen · 2025-05-03T15:26:39Z

mteb/abstasks/data_cleaning.py

I would probably update the description of the task, though. Something like:

"This task version fixes discovered [quality issues](link to PR) by removing empty strings (XX%), duplicates (XX%) and leakage in the test set from the training set (XX%)."

If we downsample, I would add:

"We additionally downsample the dataset to N samples."

If we add a test set then I would reformulate to:

"This task version fixes discovered [quality issues](link to PR) by removing empty strings (XX%), duplicates (XX%), we additionally also introduce a test set from the original dataset consisting of N samples."

AlexeyVatolin · 2025-05-04T15:56:25Z

I have refactored the code as you suggested. For each modified dataset, I created a new class with the .v2 suffix and uploaded the revised datasets to the mteb team space.

For datasets that did not originally have a test split - only a training set - I kept 2048 examples in the training set (from which 8 examples per class are sampled during classifier training), and used the remainder as the test set.

Below is a report summarizing the number of lines removed for each issue. The cleaning functions were applied in the order listed in the table:

create_test_split - splitting the training set into train and test subsets
filter_controversial - removing examples with identical text (after lowercasing and stripping leading spaces) but different labels
filter_empty - removing empty texts
deduplicate - removing duplicate texts
filter_leakage - removing examples that appear in both the training and test sets

AJGT

split	original size
train	1800

Cleaning report:

filter	split	removed
create_test_split	train_to_test	900

OnlineStoreReviewSentimentClassification

split	original size
train	2415

Cleaning report:

filter	split	removed
create_test_split	train_to_test	1207
filter_controversial	train	79
filter_controversial	test	64
deduplicate	train	38
deduplicate	test	29
filter_leakage	test	40

HotelReviewSentimentClassification

split	original size
train	105698

Cleaning report:

filter	split	removed
create_test_split	train_to_test	2048
filter_controversial	train	130
filter_controversial	test	1
deduplicate	train	1533
deduplicate	test	4
filter_leakage	test	40

RestaurantReviewSentimentClassification

split	original size
train	8364

Cleaning report:

filter	split	removed
create_test_split	train_to_test	2048
filter_controversial	train	2
deduplicate	train	35
deduplicate	test	5
filter_leakage	test	23
AJGT is unchanged

TweetEmotionClassification

split	original size
train	10065

Cleaning report:

filter	split	removed
create_test_split	train_to_test	2048
filter_controversial	train	15
filter_controversial	test	1
filter_empty	train	1
deduplicate	train	28
deduplicate	test	1
filter_leakage	test	7

TweetSarcasmClassification

split	original size
train	8437
test	2110

Cleaning report:

filter	split	removed
filter_controversial	train	96
filter_controversial	test	18
deduplicate	train	216
deduplicate	test	13
filter_leakage	test	118

BengaliHateSpeechClassification

split	original size
train	3418

Cleaning report:

filter	split	removed
create_test_split	train_to_test	1709
filter_controversial	train	111
filter_controversial	test	111
deduplicate	train	31
deduplicate	test	28
filter_leakage	test	55

BengaliSentimentAnalysis

split	original size
train	11807

Cleaning report:

filter	split	removed
create_test_split	train_to_test	2048
filter_controversial	test	2
deduplicate	train	829
deduplicate	test	112
filter_leakage	test	111

BengaliDocumentClassification

split	original size
train	220574
validation	4994
test	15012

Cleaning report:

filter	split	removed
filter_controversial	train	332
filter_controversial	validation	2
filter_controversial	test	19
deduplicate	train	112
filter_leakage	test	5

AlexeyVatolin · 2025-05-04T15:57:38Z

Here is a comparison of the results. The scores have changed slightly because I updated the train/test split.

multilingual-e5-small

task_name	main_score_old	main_score_new	delta_percent
AJGT	0.745500	0.740667	-0.65
BengaliDocumentClassification	0.506934	0.511182	0.84
BengaliHateSpeechClassification	0.492376	0.496289	0.79
BengaliSentimentAnalysis	0.831888	0.815144	-2.01
HotelReviewSentimentClassification	0.494775	0.495557	0.16
OnlineStoreReviewSentimentClassification	0.362256	0.317225	-12.43
RestaurantReviewSentimentClassification	0.635986	0.642426	1.01
TweetEmotionClassification	0.511279	0.477538	-6.60
TweetSarcasmClassification	0.609289	0.567364	-6.88

paraphrase-multilingual-MiniLM-L12-v2

task_name	main_score_old	main_score_new	delta_percent
AJGT	0.699278	0.714000	2.11
BengaliDocumentClassification	0.335498	0.334521	-0.29
BengaliHateSpeechClassification	0.340113	0.331887	-2.42
BengaliSentimentAnalysis	0.606135	0.588922	-2.84
HotelReviewSentimentClassification	0.420410	0.429406	2.14
OnlineStoreReviewSentimentClassification	0.321680	0.291713	-9.32
RestaurantReviewSentimentClassification	0.604102	0.630842	4.43
TweetEmotionClassification	0.385498	0.365081	-5.30
TweetSarcasmClassification	0.540664	0.485110	-10.28

KennethEnevoldsen

Thanks for the detailed description!

I would probably deduplicate before creating the train-test split to prevent the leakage, but it comes out the same, I suppose.

I kept 2048 examples in the training set

What if it has less than 2k? Seems like this is the case for AJGT?

Seems like AJGT didn't have any issues, any reason to fix it?

mteb/tasks/Classification/ben/BengaliDocumentClassification.py

mteb/tasks/Classification/ara/AJGT.py

AlexeyVatolin · 2025-05-05T21:32:38Z

Thanks for the detailed description!

I would probably deduplicate before creating the train-test split to prevent the leakage, but it comes out the same, I suppose.

@KennethEnevoldsen, Agreed. Since this doesn't affect the final result, I'll leave it as is for now, but I'll make sure to do the deduplication before splitting in my future pull requests.

I kept 2048 examples in the training set

What if it has less than 2k? Seems like this is the case for AJGT?

If it has less that 2k then I split it into two equal halves

Seems like AJGT didn't have any issues, any reason to fix it?

AJGT has only train split, which means that we are training and testing on the same data.

KennethEnevoldsen

I think this essentially there. I would be great if it was clear for the description why we updated the task (I know that I have had issues with the current version changes that this wasn't clear).

If we want to automate the fixes, the it might be possible to simply link to an issue/PR.

mteb/tasks/Classification/ara/HotelReviewSentimentClassification.py

mteb/tasks/Classification/ara/OnlineStoreReviewSentimentClassification.py

mteb/tasks/Classification/ara/RestaurantReviewSentimentClassification.py

mteb/tasks/Classification/ara/TweetEmotionClassification.py

…ion_dataset_cleaning

AlexeyVatolin · 2025-05-09T10:53:09Z

@KennethEnevoldsen, Thanks for the feedback!

My aim is to completely automate the fixing of data errors. I've already developed a script that handles this. For this current PR, I've only applied the changes to 9 datasets to gather initial feedback. Once we're confident, I plan to apply this script across all tasks.

To ensure clarity and provide a clear audit trail, I propose that for each subsequent update, we include a link in the dataset's description pointing to the relevant Pull Request. Within that PR, I will provide detailed tables summarizing the changes made, similar to the format shown in this comment: #2632 (comment)

Additionally, for this PR, I have manually reviewed and corrected all numerical values mentioned in the descriptions. I intend to continue this practice for future PRs as well. While this part cannot be fully automated, I don't anticipate it will add significant overhead.

KennethEnevoldsen · 2025-05-10T19:59:40Z

Sounds great!

Added a comment in the other PR on the text on the phrasing on the message, but otherwise I think we are good.

A thing that might be nice to do is also check for:

Empty string
Minimal length, e.g. like minimum 3 words (this one might be too hard, to generalize across dataset + languages)

(sorry for adding it now, if it is too much of a hassle feel free to ignore it)

AlexeyVatolin · 2025-05-10T20:17:38Z

@KennethEnevoldsen, I've already added filtering for empty strings; you can see this in the results for the TweetEmotionClassification task. I'll think about how to add filtering based on the minimum number of words and will update here. It probably makes sense to have separate filters for languages with ideograms and for Arabic languages. Perhaps you have ideas on what minimum length conditions would be appropriate for each writing system?

Samoed · 2025-05-11T09:34:17Z

mteb/tasks/Classification/ara/AJGT.py

+
+
+class AJGTV2(AbsTaskClassification):
+    metadata = TaskMetadata(


Can you also add adapted_from for all new datasets?

@AlexeyVatolin Can you add this to your tasks?

KennethEnevoldsen · 2025-05-11T10:36:13Z

Perhaps you have ideas on what minimum length conditions would be appropriate for each writing system?

I think everything that is Latin script is probably reasonable to have at least 3 words (white space separated). At least for classification

For retrieval it can be a bit different as we often search in single words.

…ion_dataset_cleaning

AlexeyVatolin · 2025-05-21T22:37:53Z

@KennethEnevoldsen, I added the filter_short for all languages except "zho", "jpn", "tha" and "mya". I also changed the filter order slightly. The new order is: filter_empty, deduplicate, filter_short, create_test_split, filter_leakage, and filter_controversial.

Here is updated results:

Original Sizes

Task	Split	Original Size
AJGT	train	1800
HotelReviewSentimentClassification	train	2048
OnlineStoreReviewSentimentClassification	train	2415
RestaurantReviewSentimentClassification	train	8364
TweetEmotionClassification	train	2048
TweetSarcasmClassification	train	8437
TweetSarcasmClassification	test	2110
BengaliDocumentClassification	train	220574
BengaliDocumentClassification	validation	4994
BengaliDocumentClassification	test	15012
BengaliHateSpeechClassification	train	3418
BengaliSentimentAnalysis	train	11807

Cleaning Report

Task	Filter	Split	Removed
AJGT	filter_short	train	116
AJGT	create_test_split	train_to_test	842
HotelReviewSentimentClassification	deduplicate	train	4
HotelReviewSentimentClassification	filter_short	train	4
HotelReviewSentimentClassification	create_test_split	train_to_test	1020
OnlineStoreReviewSentimentClassification	deduplicate	train	223
OnlineStoreReviewSentimentClassification	filter_short	train	238
OnlineStoreReviewSentimentClassification	create_test_split	train_to_test	977
RestaurantReviewSentimentClassification	deduplicate	train	64
RestaurantReviewSentimentClassification	filter_short	train	144
RestaurantReviewSentimentClassification	create_test_split	train_to_test	2048
TweetEmotionClassification	deduplicate	train	1
TweetEmotionClassification	filter_short	train	30
TweetEmotionClassification	create_test_split	train_to_test	1008
TweetSarcasmClassification	deduplicate	train	259
TweetSarcasmClassification	filter_short	train	45
TweetSarcasmClassification	deduplicate	test	16
TweetSarcasmClassification	filter_short	test	15
TweetSarcasmClassification	filter_leakage	test	130
BengaliDocumentClassification	filter_empty	train	21
BengaliDocumentClassification	deduplicate	train	260
BengaliDocumentClassification	filter_short	train	3
BengaliDocumentClassification	filter_empty	test	1
BengaliDocumentClassification	filter_short	test	1
BengaliDocumentClassification	filter_leakage	test	23
BengaliDocumentClassification	filter_controversial	train	1
BengaliDocumentClassification	filter_controversial	validation	1
BengaliHateSpeechClassification	deduplicate	train	238
BengaliHateSpeechClassification	filter_short	train	14
BengaliHateSpeechClassification	create_test_split	train_to_test	1583
BengaliSentimentAnalysis	deduplicate	train	1053
BengaliSentimentAnalysis	filter_short	train	662
BengaliSentimentAnalysis	create_test_split	train_to_test	2048

Results

intfloat/multilingual-e5-small

task_name	main_score_old	main_score_new	delta_percent
AJGT	0.7455	0.754157	1.16
BengaliDocumentClassification	0.506934	0.464876	-8.3
BengaliHateSpeechClassification	0.492376	0.486812	-1.13
BengaliSentimentAnalysis	0.831888	0.793829	-4.58
HotelReviewSentimentClassification	0.494775	0.488627	-1.24
OnlineStoreReviewSentimentClassification	0.362256	0.365609	0.93
RestaurantReviewSentimentClassification	0.635986	0.629834	-0.97
TweetEmotionClassification	0.511279	0.504365	-1.35
TweetSarcasmClassification	0.609289	0.586147	-3.8

sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

task_name	main_score_old	main_score_new	delta_percent
AJGT	0.699278	0.705107	0.83
BengaliDocumentClassification	0.335498	0.299153	-10.83
BengaliHateSpeechClassification	0.340113	0.323596	-4.86
BengaliSentimentAnalysis	0.606135	0.549482	-9.35
HotelReviewSentimentClassification	0.42041	0.43598	3.7
OnlineStoreReviewSentimentClassification	0.32168	0.290686	-9.64
RestaurantReviewSentimentClassification	0.604102	0.622998	3.13
TweetEmotionClassification	0.385498	0.399702	3.68
TweetSarcasmClassification	0.540664	0.540534	-0.02

KennethEnevoldsen

I would like to make the PR link a hyperlink instead, as we will likely include it in the docs.

Otherwise nothing to add.

KennethEnevoldsen · 2025-05-26T11:22:36Z

Wonderful!

@ayush1298

* move icon & name to benchmark dataclass (#2573) * Remove the comments from ImageEncoder (#2579) * fix: Add Encodechka benchmark (#2561) * add tasks * add benchmark * fix imports * update stsb split * Update tasks table * 1.38.2 Automatically generated by python-semantic-release * fix FlagEmbedding package name (#2588) * fix codecarbon version (#2587) * Add MIEB image only benchmark (#2590) * add vision only bench * add description * correct zs task modalities * specify tasks param * Add image only MIEB benchmark to LB left panel (#2596) * Update benchmarks.py * make lint * add to left side bar * update Doubao-1.5-Embedding (#2575) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebSSL models (#2604) * add 2 web SSL dino models * add models from collection and revisions * update memory_usage_mb and embed dim * use automodel instead * fix mieb citation (#2606) * 1.38.3 Automatically generated by python-semantic-release * Update Doubao-1.5-Embedding (#2611) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * CI: update benchmark table (#2609) * update benchmark table * fix table * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * try adding init * add init in audio pc task eng * all audio tasks init * remove script test --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com> Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: 24September <puritysarah@naver.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Feiyang <feiyangc@google.com> Co-authored-by: Thomas van Dongen <thomas123@live.nl> Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com> Co-authored-by: Wentao Wu <wuwentao137@gmail.com> Co-authored-by: Manveer Tamber <manveertamber@gmail.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com> Co-authored-by: Xin Zhang <izhx404@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>

@ayush1298

* Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com> Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: 24September <puritysarah@naver.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Feiyang <feiyangc@google.com> Co-authored-by: Thomas van Dongen <thomas123@live.nl> Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com> Co-authored-by: Wentao Wu <wuwentao137@gmail.com> Co-authored-by: Manveer Tamber <manveertamber@gmail.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com> Co-authored-by: Xin Zhang <izhx404@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Quan Yuhan <929888357@qq.com> Co-authored-by: Quan Yuhan <yuhan_quan@qq.com> Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com> Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com> Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com> Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: lsz05 <lszgz0521@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

@ayush1298

* Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * dataset: Evalita dataset integration (#2859) * Added DadoEvalCoarseClassification * Removed unnecessary columns from DadoEvalCoarseClassification * Added EmitClassification task * added SardiStanceClassification task * Added GeoLingItClassification task * Added DisCoTexPairClassification tasks * Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits * changed import in DisCoTexPairClassification * removed GeoLingItClassification dataset * fixed citation formatting, missing metadata parameters and lint formatting * - Added XGlueWRPReranking task - Added missing __init__.py files * fixed metadata in XGlueWRPReranking * Added MKQARetrieval task * fixed type in XGlueWRPReranking * changed MKQARetrieval from cross-lingual to monolingual * formatted MKQARetrieval file * removed unused const --------- Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> * Update tasks & benchmarks tables * fix: pin datasets version (#2892) fix datasets version * 1.38.34 Automatically generated by python-semantic-release * merge main --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: 24September <puritysarah@naver.com> Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com> Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Feiyang <feiyangc@google.com> Co-authored-by: Thomas van Dongen <thomas123@live.nl> Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com> Co-authored-by: Wentao Wu <wuwentao137@gmail.com> Co-authored-by: Manveer Tamber <manveertamber@gmail.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com> Co-authored-by: Xin Zhang <izhx404@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Quan Yuhan <929888357@qq.com> Co-authored-by: Quan Yuhan <yuhan_quan@qq.com> Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com> Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com> Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com> Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: lsz05 <lszgz0521@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: MattiaSangermano <43407984+MattiaSangermano@users.noreply.github.com> Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>

Samoed reviewed May 3, 2025

View reviewed changes

KennethEnevoldsen reviewed May 3, 2025

View reviewed changes

Improve classification datasets quality for ara and ben langs

439febc

AlexeyVatolin force-pushed the classification_dataset_cleaning branch from 9f63f00 to 439febc Compare May 4, 2025 15:15

AlexeyVatolin added 2 commits May 4, 2025 15:35

add missing AJGT

9805505

fix format

5669a88

AlexeyVatolin changed the title ~~Classification dataset cleaning~~ Ara and ben classification dataset cleaning May 4, 2025

KennethEnevoldsen reviewed May 4, 2025

View reviewed changes

mteb/tasks/Classification/ben/BengaliDocumentClassification.py Show resolved Hide resolved

mteb/tasks/Classification/ara/AJGT.py Outdated Show resolved Hide resolved

change ajgt description

aba9b38

AlexeyVatolin requested a review from KennethEnevoldsen May 7, 2025 14:14

KennethEnevoldsen approved these changes May 8, 2025

View reviewed changes

Merge branch 'main' of github.com:AlexeyVatolin/mteb into classificat…

a7fa450

…ion_dataset_cleaning

Fix numbers in description, add link to pull request

2f03ea9

AlexeyVatolin requested review from KennethEnevoldsen and Samoed May 10, 2025 15:15

Samoed reviewed May 11, 2025

View reviewed changes

AlexeyVatolin added 2 commits May 21, 2025 20:25

Merge branch 'main' of github.com:AlexeyVatolin/mteb into classificat…

e373668

…ion_dataset_cleaning

Add too short filter

e7d9c3f

KennethEnevoldsen approved these changes May 22, 2025

View reviewed changes

Link in markdown format

5fc8dc5

AlexeyVatolin requested a review from KennethEnevoldsen May 24, 2025 18:12

KennethEnevoldsen merged commit 4093099 into embeddings-benchmark:main May 26, 2025
9 checks passed

AlexeyVatolin mentioned this pull request Jul 13, 2025

Classification dataset cleaning #2900

Merged



		class AJGTV2(AbsTaskClassification):
		metadata = TaskMetadata(

Ara and ben classification dataset cleaning #2632

Ara and ben classification dataset cleaning #2632

Uh oh!

Conversation

AlexeyVatolin commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scores comparison

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

Samoed May 3, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen May 3, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen May 3, 2025

Choose a reason for hiding this comment

Uh oh!

AlexeyVatolin May 4, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen May 3, 2025

Choose a reason for hiding this comment

Uh oh!

AlexeyVatolin commented May 4, 2025

Uh oh!

AlexeyVatolin commented May 4, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexeyVatolin commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeyVatolin commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented May 10, 2025

Uh oh!

AlexeyVatolin commented May 10, 2025

Uh oh!

Samoed May 11, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 22, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented May 11, 2025

Uh oh!

AlexeyVatolin commented May 21, 2025

Original Sizes

Cleaning Report

Results

Uh oh!

KennethEnevoldsen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented May 26, 2025

Uh oh!

Uh oh!

AlexeyVatolin commented May 2, 2025 •

edited

Loading

AlexeyVatolin commented May 5, 2025 •

edited

Loading

AlexeyVatolin commented May 9, 2025 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading