model: Added 3 HIT-TMG's KaLM-embedding models #2478

ayush1298 · 2025-04-02T17:19:17Z

fixes #1445 #2482
Added 3 models:

HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper
HIT_TMG/KaLM_embedding_multilingual_mini_v1
HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5 with instruct wrapper

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

…uct wrapper

ayush1298 · 2025-04-02T17:20:01Z

@Samoed I will not be able to run the models on all tasks and add results to results repo. Can you do that if possible?

Samoed

Can you also add https://huggingface.co/HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5?

mteb/models/hit_tmg_models.py

mteb/models/ops_moa_models.py

docs/tasks.md

ayush1298 · 2025-04-03T18:56:26Z

Detailed Analysis of Results Comparison:
M1: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1
M2: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5
M3: HIT-TMG/KaLM-embedding-multilingual-mini-v1

Task Type - "Classification": Task - "EmotionClassification"

Significant Differences in Results

Metric	M1-New	M1-Old	M2-New	M2-Old	M3-New	M3-Old
Accuracy	0.604	0.85565	0.6017	0.869	0.5118	0.53945
F1	0.5475	0.81123	0.5469	0.82434	0.4573	0.46749
F1 Weighted	0.6202	0.85983	0.6184	0.87211	0.5321	0.55445
Main Score	0.604	0.85565	0.6017	0.869	0.5118	0.53945

Task Type - "MultilabelClassification": Task - "CEDRClassification"

Metric	M1-New	M1-Old	M2-New	M2-Old	M3-New	M3-Old
Accuracy	0.3972	0.4330	0.3908	0.4376	0.4015	0.4216
F1	0.2724	0.4111	0.2719	0.4247	0.3118	0.3909
LRAP	0.6429	0.7206	0.6409	0.7363	0.6595	0.7107
Main Score	0.3972	0.4330	0.3908	0.4376	0.4015	0.4216

Task Type - "Clustering": Task - "GeoreviewClusteringP2P"

Metric	M1-New	M1-Old	M2-New	M2-Old	M3-New	M3-Old
Main Score	0.6329	0.6028	0.6324	0.6076	0.6211	0.6340
V-Measure	0.6329	0.6028	0.6324	0.6076	0.6211	0.6340
V-Measure Std	0.0088	0.0103	0.0091	0.0066	0.0098	0.0044

Task Type - "PairClassification": Task - "Ocnli"

Metric	M1-Old	M1-New	M2-Old	M2-New	M3-Old	M3-New
Cosine Accuracy	0.6687	0.6665	0.6622	0.6622	0.6703	0.6703
Cosine Accuracy Threshold	0.8435	0.8587	0.8553	0.8553	0.6514	0.6514
Cosine AP	0.6983	0.6968	0.6921	0.6921	0.6949	0.6949
Cosine F1	0.7134	0.7075	0.7045	0.7045	0.7128	0.7128
Cosine F1 Threshold	0.8386	0.8333	0.8385	0.8385	0.5884	0.5884
Cosine Precision	0.6393	0.6043	0.6136	0.6136	0.6234	0.6234
Cosine Recall	0.8068	0.8532	0.8268	0.8268	0.8321	0.8321
Dot Accuracy	0.6687	0.6665	0.6622	0.6622	0.6703	0.6703
Dot Accuracy Threshold	0.8435	0.8587	0.8553	0.8553	0.6514	0.6514
Dot AP	0.6983	0.6968	0.6921	0.6921	0.6949	0.6949
Dot F1	0.7134	0.7075	0.7045	0.7045	0.7128	0.7128
Dot F1 Threshold	0.8386	0.8333	0.8385	0.8385	0.5884	0.5884
Dot Precision	0.6393	0.6043	0.6136	0.6136	0.6234	0.6234
Dot Recall	0.8068	0.8532	0.8268	0.8268	0.8321	0.8321
Euclidean Accuracy	0.6687	0.6665	0.6622	0.6622	0.6703	0.6703
Euclidean Accuracy Threshold	0.5595	0.5316	0.5379	0.5379	0.8349	0.8349
Euclidean AP	0.6983	0.6968	0.6921	0.6921	0.6949	0.6949
Euclidean F1	0.7134	0.7075	0.7045	0.7045	0.7128	0.7128
Euclidean F1 Threshold	0.5682	0.5775	0.5682	0.5682	0.9074	0.9074
Euclidean Precision	0.6393	0.6043	0.6136	0.6136	0.6234	0.6234
Euclidean Recall	0.8068	0.8532	0.8268	0.8268	0.8321	0.8321
Manhattan Accuracy	0.6605	0.6611	0.6589	0.6589	0.6692	0.6692
Manhattan Accuracy Threshold	12.1546	12.0459	12.5711	12.5711	19.5184	19.5184
Manhattan AP	0.6951	0.6940	0.6893	0.6893	0.6902	0.6902
Manhattan F1	0.7056	0.7068	0.7002	0.7002	0.7090	0.7090
Manhattan F1 Threshold	13.4479	13.3175	13.2169	13.2169	22.0142	22.0142
Manhattan Precision	0.6178	0.6184	0.6172	0.6172	0.5891	0.5891
Manhattan Recall	0.8226	0.8247	0.8089	0.8089	0.8902	0.8902
Max AP	0.6983	0.6968	0.6921	0.6921	0.6949	0.6949
Max F1	0.7134	0.7075	0.7045	0.7045	0.7128	0.7128
Max Precision	0.6393	0.6184	0.6172	0.6172	0.6234	0.6234
Max Recall	0.8226	0.8532	0.8268	0.8268	0.8902	0.8902
Similarity Accuracy	0.6687	0.6665	0.6622	0.6622	0.6703	0.6703
Similarity Accuracy Threshold	0.8435	0.8587	0.8553	0.8553	0.6514	0.6514
Similarity AP	0.6983	0.6968	0.6921	0.6921	0.6949	0.6949
Similarity F1	0.7134	0.7075	0.7045	0.7045	0.7128	0.7128
Similarity F1 Threshold	0.8386	0.8333	0.8385	0.8385	0.5884	0.5884
Similarity Precision	0.6393	0.6043	0.6136	0.6136	0.6234	0.6234
Similarity Recall	0.8068	0.8532	0.8268	0.8268	0.8321	0.8321
Main Score	0.6983	0.6665	0.6921	0.6622	0.6949	0.6703

Samoed · 2025-04-03T19:15:04Z

Hm. I they've reported different prompts in paper with what've using. Can you update your implementation with their prompts? You can change model to use sentence transformer wrapper, but this is a hack and not clear how to integrate their resuls properly. At least can you try to change prompt for 2-3 tasks directly to test if our implementation will match?

ayush1298 · 2025-04-03T19:18:59Z

Hm. I they've reported different prompts in paper with what've using. Can you update your implementation with their prompts? You can change model to use sentence transformer wrapper, but this is a hack and not clear how to integrate their resuls properly. At least can you try to change prompt for 2-3 tasks directly to test if our implementation will match?

I think only Classification and MultilabelClassification results are having some differences. For retrieval, reranking, STS tasks(whose results I was going to share in sometime), there are no differences.

Update:
I saw thier paper, they have given different task instructions, each 1 is specific for the task. Should we support task-specific instructions in MTEB?

Samoed · 2025-04-03T19:33:44Z

I saw thier paper, they have given different task instructions, each 1 is specific for the task. Should we support task-specific instructions in MTEB?

I think you can create an issue for it to discuss. After we will decide what to do with this model

Samoed · 2025-04-07T10:38:31Z

@ayush1298 You can change get_instruction

mteb/mteb/models/wrapper.py

Line 91 in cb2825c

def get_instruction(task_name: str, prompt_type: PromptType | None) -> str:

similarly to get_prompt_name

mteb/mteb/models/wrapper.py

Line 21 in cb2825c

def get_prompt_name(

ayush1298 · 2025-04-08T06:42:52Z

@Samoed I have modified get_instruction similar to get_prompt_name, but I don't know how to exactly incorporate this in model_meta for each model.

1 more thing, I think what I missed is the prompt given at end in paper are having same format only of:
HIT_TMG_INSTRUCTION = "Instruct: {instruction}\nQuery: "

they just have given these as an example with task-specific instruction and query for each task.

mteb/models/wrapper.py

mteb/models/hit_tmg_models.py

ayush1298 · 2025-05-02T07:37:32Z

@Samoed Can you check this? I have added suggested changes
Sorry for late reply.

mteb/models/hit_tmg_models.py

mteb/models/wrapper.py

Samoed · 2025-05-21T12:20:38Z

I've remembered that you haven't added prompts to PairClassification tasks, but mteb is adding it's default instruction for tasks. This describes why in this branch for pairclassfication we got 0.63, but in main 0.66.

I tried to run your script and got these results. They're maching with your reported scores, except for main score for Ocnli. It seems we had a bug for pair classification tasks, because there is not max_accuracy, which is main_metric and because of that you've reported max_ap as main score instead of max_accuracy

{'similarity_accuracy': 0.6621548456957228, 'similarity_accuracy_threshold': 0.8553438186645508, 'similarity_f1': 0.7044534412955467, 'similarity_f1_threshold': 0.8385477662086487, 'similarity_precision': 0.6136363636363636, 'similarity_recall': 0.8268215417106652, 'similarity_ap': 0.6921438509764521, 'cosine_accuracy': 0.6621548456957228, 'cosine_accuracy_threshold': 0.8553438186645508, 'cosine_f1': 0.7044534412955467, 'cosine_f1_threshold': 0.8385477662086487, 'cosine_precision': 0.6136363636363636, 'cosine_recall': 0.8268215417106652, 'cosine_ap': 0.6921438509764521, 'manhattan_accuracy': 0.6589063345966432, 'manhattan_accuracy_threshold': 12.571066856384277, 'manhattan_f1': 0.70018281535649, 'manhattan_f1_threshold': 13.216854095458984, 'manhattan_precision': 0.6172441579371475, 'manhattan_recall': 0.808870116156283, 'manhattan_ap': 0.689316563080095, 'euclidean_accuracy': 0.6621548456957228, 'euclidean_accuracy_threshold': 0.5378776788711548, 'euclidean_f1': 0.7044534412955467, 'euclidean_f1_threshold': 0.5682468414306641, 'euclidean_precision': 0.6136363636363636, 'euclidean_recall': 0.8268215417106652, 'euclidean_ap': 0.6921438509764521, 'dot_accuracy': 0.6621548456957228, 'dot_accuracy_threshold': 0.8553438186645508, 'dot_f1': 0.7044534412955467, 'dot_f1_threshold': 0.8385478258132935, 'dot_precision': 0.6136363636363636, 'dot_recall': 0.8268215417106652, 'dot_ap': 0.6921438509764521, 'max_f1': 0.7044534412955467, 'max_precision': 0.6172441579371475, 'max_recall': 0.8268215417106652, 'max_ap': 0.6921438509764521}

YanshekWoo · 2025-05-21T12:20:58Z

HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5	Ocnli
w/ `InstructSentenceTransformerWrapper`	0.624256
w/o `InstructSentenceTransformerWrapper`	0.662155

I found the unexpected difference with InstructSentenceTransformerWrapper loader, since Ocnli(PariClassification) should not apply any instruction.

The score 0.662155 WITHOUT InstructSentenceTransformerWrapper is consistent with the above results (#2478 (comment)), I evaluate via my own encoder class.
Is there any proper way to setting none instruction for STS and PairClassification under InstructSentenceTransformerWrapper class?

Samoed · 2025-05-21T12:24:03Z

Is there any proper way to setting none instruction for STS and PairClassification under InstructSentenceTransformerWrapper class?

I think, yes we can change that.

@ayush1298 can you evaluate these models on retrieval and reranking tasks (and preferably to run all task types) to see if you need to apply prompt for passages, because it seems that we found main differences for tasks for now.

YanshekWoo · 2025-05-21T12:34:29Z

I tried to run your script and got these results. They're maching with your reported scores, except for main score. It seems we had a bug for pair classification tasks, because there is not max_accuracy, which is main_metric and because of that you've reported max_ap as main score instead of max_accuracy

I see. This may be due to a bug in the statistical methods used in previous versions of MTEB.

I have the JSON files of all previous evaluation results (mteb==1.14.5), and I can provide them if needed. Alternatively, they should also be directly available in the model's README.

ayush1298 · 2025-05-24T13:25:05Z

Sorry for the delay. Below are the tables for other tasks:

M1: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1
M2: HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5
M3: HIT-TMG/KaLM-embedding-multilingual-mini-v1

Task Type - "STS": Task - "STS12"

Metric	M1-Old	M1-New	M2-Old	M2-New	M3-Old	M3-New
Cosine Pearson	0.87941	0.82288	0.87306	0.80997	0.85341	0.85342
Cosine Spearman	0.81495	0.74878	0.80167	0.73782	0.76668	0.76669
Euclidean Pearson	0.85671	0.78194	0.84244	0.76561	0.81434	0.81434
Euclidean Spearman	0.81495	0.74879	0.80166	0.73783	0.76668	0.76671
Manhattan Pearson	0.85794	0.78359	0.84416	0.76715	0.81698	0.81698
Manhattan Spearman	0.81688	0.75153	0.80380	0.74036	0.77010	0.77012
Pearson	0.87941	0.82288	0.87306	0.80997	0.85341	0.85342
Spearman	0.81495	0.74878	0.80167	0.73782	0.76668	0.76669
Main Score	0.81495	0.74878	0.80167	0.73782	0.76668	0.76669

Task Type - "Reranking": Task - "SyntecReranking"

Metric	M1-Old	M1-New	M2-Old	M2-New	M3-Old	M3-New
MAP	0.866	0.86694	0.85968	0.83760	0.87267	0.87267
MRR	0.866	0.86694	0.85968	0.83760	0.87267	0.87267
Main Score	0.866	0.86694	0.85968	0.83760	0.87267	0.87267
nAUC MAP Diff@1	0.59662	0.52770	0.62894	0.65819	0.61789	0.61789
nAUC MAP Max	0.19769	0.06864	0.17784	0.07104	-0.02201	-0.02201
nAUC MAP Std	0.44670	0.31199	0.42069	0.42196	0.37306	0.37306
nAUC MRR Diff@1	0.59662	0.52770	0.62894	0.65819	0.61789	0.61789
nAUC MRR Max	0.19769	0.06864	0.17784	0.07104	-0.02201	-0.02201
nAUC MRR Std	0.44670	0.31199	0.42069	0.42196	0.37306	0.37306

Task Type - "Retrieval": Task - "SyntecRetrieval"

Metric	Model 1 Old	Model 1 New	Model 2 Old	Model 2 New	Model 3 Old	Model 3 New
main_score	0.81899	0.80667	0.81745	0.80527	0.82436	0.82436
map_at_1	0.64	0.62	0.63	0.62	0.64	0.64
map_at_3	0.74833	0.73333	0.74833	0.72833	0.75333	0.75333
map_at_5	0.76183	0.74983	0.76033	0.74233	0.76283	0.76283
map_at_10	0.76594	0.7525	0.76325	0.74803	0.76996	0.76996
map_at_20	0.76644	0.75306	0.76392	0.74919	0.76996	0.76996
map_at_100	0.76662	0.75351	0.76413	0.74919	0.77013	0.77013
map_at_1000	0.76662	0.75351	0.76413	0.74919	0.77013	0.77013
mrr_at_1	0.64	0.62	0.63	0.62	0.64	0.64
mrr_at_3	0.74833	0.73333	0.74833	0.72833	0.75333	0.75333
mrr_at_5	0.76183	0.74983	0.76033	0.74233	0.76283	0.76283
mrr_at_10	0.76594	0.7525	0.76325	0.74803	0.76996	0.76996
mrr_at_20	0.76644	0.75306	0.76392	0.74919	0.76996	0.76996
mrr_at_100	0.76662	0.75351	0.76413	0.74919	0.77013	0.77013
mrr_at_1000	0.76662	0.75351	0.76413	0.74919	0.77013	0.77013
ndcg_at_1	0.64	0.62	0.63	0.62	0.64	0.64
ndcg_at_3	0.78464	0.77095	0.78964	0.76702	0.79095	0.79095
ndcg_at_5	0.80917	0.80022	0.81074	0.79198	0.80774	0.80774
ndcg_at_10	0.81899	0.80667	0.81745	0.80527	0.82436	0.82436
ndcg_at_20	0.82126	0.80903	0.81995	0.81005	0.82436	0.82436
ndcg_at_100	0.82297	0.81266	0.82175	0.81005	0.82607	0.82607
ndcg_at_1000	0.82297	0.81266	0.82175	0.81005	0.82607	0.82607
precision_at_1	0.64	0.62	0.63	0.62	0.64	0.64
precision_at_3	0.29667	0.29333	0.30333	0.29333	0.3	0.3
precision_at_5	0.19	0.19	0.192	0.188	0.188	0.188
precision_at_10	0.098	0.097	0.098	0.098	0.099	0.099
precision_at_20	0.0495	0.049	0.0495	0.05	0.0495	0.0495
precision_at_100	0.01	0.01	0.01	0.01	0.01	0.01
precision_at_1000	0.001	0.001	0.001	0.001	0.001	0.001
recall_at_1	0.64	0.62	0.63	0.62	0.64	0.64
recall_at_3	0.89	0.88	0.91	0.88	0.9	0.9
recall_at_5	0.95	0.95	0.96	0.94	0.94	0.94
recall_at_10	0.98	0.97	0.98	0.98	0.99	0.99
recall_at_20	0.99	0.98	0.99	1.0	0.99	0.99
recall_at_100	1.0	1.0	1.0	1.0	1.0	1.0
recall_at_1000	1.0	1.0	1.0	1.0	1.0	1.0

@Samoed do you want me to run and compare evaluations on any other tasks?

Samoed · 2025-05-24T14:18:27Z

No, I don't think we need to run tasks for now. We need to fix some problems with prompt

Don't apply prompts on passages
Don't apply prompts on PairClassification, STS and Summarization

ayush1298 · 2025-05-24T14:24:32Z

No, I don't think we need to run tasks for now. We need to fix some problems with prompt

You asked in this comment to evaluate on other tasks. So, I run them.

Don't apply prompts on passages

Don't apply prompts on PairClassification, STS and Summarization

How can I ensure this thing?

Samoed · 2025-05-24T16:42:49Z

How can I ensure this thing?

You can see in logs which prompt is applied

Samoed · 2025-06-07T21:49:13Z

@YanshekWoo I've updated implementation of your model and got these results for HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5:

task	PR	Reported
SyntecRetrieval	0.81497	81.745
Ocnli	0.6621548456957228	69.2143

YanshekWoo · 2025-06-07T23:25:07Z

@Samoed Got it. Thank you for your efforts.

Samoed · 2025-06-08T18:19:55Z

I've run on more tasks HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5 and got these results:

task_name	main_score	reported
EmotionClassification	0.86885	86.9
SciFact	0.71359	72.887
SciDocsRR	0.816441	80.99314
SCIDOCS	0.17654	19.967
STS16	0.848328	83.5753
SprintDuplicateQuestions	0.930568	93.0568
ToxicConversationsClassification	0.892725	89.277
AskUbuntuDupQuestions	0.60441	60.3467
SummEval	0.252182	25.2273
TwitterSemEval2015	0.714391	71.4391

There is most difference in retrieval tasks (SciFact and SCIDOCS) and small in reranking (AskUbuntuDupQuestions and SciDocsRR)

@YanshekWoo

YanshekWoo · 2025-06-13T12:36:16Z

I've run on more tasks HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5 and got these results:

OK, I think these results basically match, and the deviations are very small.

Samoed · 2025-06-13T12:41:29Z

Great! I think then we will wait for @KennethEnevoldsen review and after that we can merge it. Thank you for helping @YanshekWoo!

KennethEnevoldsen

Generally things looks good here. I don't think I have a lot to add

KennethEnevoldsen · 2025-06-15T17:08:06Z

mteb/models/hit_tmg_models.py

+        instruction = self.get_task_instruction(
+            task_name, prompt_type, self.prompts_dict
+        )
+        from mteb import get_task


Is it here due to circular import?

As I remember, yes

Alright - should we add a comment on this (ideally we should move it out in v2)

This problem won't occur in v2, because we're passing TaskMetadata directly

Perfect - I think we are good to merge then

@ayush1298

* move icon & name to benchmark dataclass (#2573) * Remove the comments from ImageEncoder (#2579) * fix: Add Encodechka benchmark (#2561) * add tasks * add benchmark * fix imports * update stsb split * Update tasks table * 1.38.2 Automatically generated by python-semantic-release * fix FlagEmbedding package name (#2588) * fix codecarbon version (#2587) * Add MIEB image only benchmark (#2590) * add vision only bench * add description * correct zs task modalities * specify tasks param * Add image only MIEB benchmark to LB left panel (#2596) * Update benchmarks.py * make lint * add to left side bar * update Doubao-1.5-Embedding (#2575) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebSSL models (#2604) * add 2 web SSL dino models * add models from collection and revisions * update memory_usage_mb and embed dim * use automodel instead * fix mieb citation (#2606) * 1.38.3 Automatically generated by python-semantic-release * Update Doubao-1.5-Embedding (#2611) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * CI: update benchmark table (#2609) * update benchmark table * fix table * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * try adding init * add init in audio pc task eng * all audio tasks init * remove script test --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com> Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: 24September <puritysarah@naver.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Feiyang <feiyangc@google.com> Co-authored-by: Thomas van Dongen <thomas123@live.nl> Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com> Co-authored-by: Wentao Wu <wuwentao137@gmail.com> Co-authored-by: Manveer Tamber <manveertamber@gmail.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com> Co-authored-by: Xin Zhang <izhx404@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>

@ayush1298

* Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com> Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: 24September <puritysarah@naver.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Feiyang <feiyangc@google.com> Co-authored-by: Thomas van Dongen <thomas123@live.nl> Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com> Co-authored-by: Wentao Wu <wuwentao137@gmail.com> Co-authored-by: Manveer Tamber <manveertamber@gmail.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com> Co-authored-by: Xin Zhang <izhx404@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Quan Yuhan <929888357@qq.com> Co-authored-by: Quan Yuhan <yuhan_quan@qq.com> Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com> Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com> Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com> Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: lsz05 <lszgz0521@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

@ayush1298

* Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <munotayush6@kgpian.iitkgp.ac.in> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * dataset: Evalita dataset integration (#2859) * Added DadoEvalCoarseClassification * Removed unnecessary columns from DadoEvalCoarseClassification * Added EmitClassification task * added SardiStanceClassification task * Added GeoLingItClassification task * Added DisCoTexPairClassification tasks * Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits * changed import in DisCoTexPairClassification * removed GeoLingItClassification dataset * fixed citation formatting, missing metadata parameters and lint formatting * - Added XGlueWRPReranking task - Added missing __init__.py files * fixed metadata in XGlueWRPReranking * Added MKQARetrieval task * fixed type in XGlueWRPReranking * changed MKQARetrieval from cross-lingual to monolingual * formatted MKQARetrieval file * removed unused const --------- Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> * Update tasks & benchmarks tables * fix: pin datasets version (#2892) fix datasets version * 1.38.34 Automatically generated by python-semantic-release * merge main --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Ömer Veysel Çağatan <72755761+asparius@users.noreply.github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: 24September <puritysarah@naver.com> Co-authored-by: namespace-Pt <61188463+namespace-Pt@users.noreply.github.com> Co-authored-by: zhangpeitian <zhangpeitian@bytedance.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Feiyang <feiyangc@google.com> Co-authored-by: Thomas van Dongen <thomas123@live.nl> Co-authored-by: Paul Teiletche <73120933+paultltc@users.noreply.github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Dawid Koterwas <73834399+Kiwinicki@users.noreply.github.com> Co-authored-by: Wentao Wu <wuwentao137@gmail.com> Co-authored-by: Manveer Tamber <manveertamber@gmail.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Manuel Faysse <43467008+ManuelFay@users.noreply.github.com> Co-authored-by: Xin Zhang <izhx404@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Quan Yuhan <929888357@qq.com> Co-authored-by: Quan Yuhan <yuhan_quan@qq.com> Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com> Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com> Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com> Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: lsz05 <lszgz0521@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: MattiaSangermano <43407984+MattiaSangermano@users.noreply.github.com> Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>

* bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * dataset: Evalita dataset integration (#2859) * Added DadoEvalCoarseClassification * Removed unnecessary columns from DadoEvalCoarseClassification * Added EmitClassification task * added SardiStanceClassification task * Added GeoLingItClassification task * Added DisCoTexPairClassification tasks * Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits * changed import in DisCoTexPairClassification * removed GeoLingItClassification dataset * fixed citation formatting, missing metadata parameters and lint formatting * - Added XGlueWRPReranking task - Added missing __init__.py files * fixed metadata in XGlueWRPReranking * Added MKQARetrieval task * fixed type in XGlueWRPReranking * changed MKQARetrieval from cross-lingual to monolingual * formatted MKQARetrieval file * removed unused const --------- Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co> * Update tasks & benchmarks tables * fix: pin datasets version (#2892) fix datasets version * 1.38.34 Automatically generated by python-semantic-release * fix model implementations * fix tasks * add metrics --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Hypothesis-Z <44766273+Hypothesis-Z@users.noreply.github.com> Co-authored-by: zhangzeqing <zhangzeqing@zhejianglab.com> Co-authored-by: fangxiaoquan <44112102+fangxiaoquan@users.noreply.github.com> Co-authored-by: Li Lei <34205771+ll0ruc@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: malteos <github@i.mieo.de> Co-authored-by: annamodels <annamodels@lgresearch.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Quan Yuhan <929888357@qq.com> Co-authored-by: Quan Yuhan <yuhan_quan@qq.com> Co-authored-by: Mohammad Kalim Akram <kalimakram@gmail.com> Co-authored-by: Sailesh Panda <sailesh.panda1997@gmail.com> Co-authored-by: bschifferer <benedikt.d.schifferer@gmail.com> Co-authored-by: tutuDoki <53423655+tutuDoki@users.noreply.github.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com> Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: lsz05 <lszgz0521@gmail.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: MattiaSangermano <43407984+MattiaSangermano@users.noreply.github.com> Co-authored-by: Mattia Sangermano <MattiaSangermano@users.noreply.huggingface.co>

Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instr…

ebacf6b

…uct wrapper

Samoed reviewed Apr 2, 2025

View reviewed changes

mteb/models/hit_tmg_models.py Outdated Show resolved Hide resolved

mteb/models/hit_tmg_models.py Outdated Show resolved Hide resolved

mteb/models/hit_tmg_models.py Outdated Show resolved Hide resolved

mteb/models/hit_tmg_models.py Outdated Show resolved Hide resolved

ayush1298 added 2 commits April 2, 2025 23:59

Added KaLM_embedding_multilingual_mini_instruct_v1_5

e56ca18

Added model to overview.py

6b8f86e

ayush1298 commented Apr 3, 2025

View reviewed changes

mteb/models/ops_moa_models.py Show resolved Hide resolved

Fix Task Count Per Language Table in tasks.md

87eaa03

ayush1298 commented Apr 3, 2025

View reviewed changes

docs/tasks.md Outdated Show resolved Hide resolved

ayush1298 added 2 commits April 4, 2025 00:59

resolve conflicts

3fb15d1

remove tasks.md

77df67c

ayush1298 mentioned this pull request Apr 3, 2025

How to Support Task Specific Instructions in HIT-TMG/KaLM-embedding Models #2482

Closed

Modified get_instruction funcion

d358108

Samoed reviewed Apr 8, 2025

View reviewed changes

mteb/models/wrapper.py Outdated Show resolved Hide resolved

ayush1298 added 2 commits April 12, 2025 00:27

Added support for prompt dict in get_instruction

cfb6f66

fix lang code

fc80892

Samoed reviewed Apr 11, 2025

View reviewed changes

mteb/models/wrapper.py Outdated Show resolved Hide resolved

Samoed reviewed Apr 11, 2025

View reviewed changes

mteb/models/hit_tmg_models.py Outdated Show resolved Hide resolved

Address comments

fd528d3

Delete mteb/models/check_models.py

981fd20

Samoed reviewed May 2, 2025

View reviewed changes

mteb/models/hit_tmg_models.py Show resolved Hide resolved

mteb/models/wrapper.py Show resolved Hide resolved

Samoed linked an issue May 2, 2025 that may be closed by this pull request

How to Support Task Specific Instructions in HIT-TMG/KaLM-embedding Models #2482

Closed

ayush1298 added 2 commits May 2, 2025 17:03

added prompts_dict support in InstructSentenceTransformerWrapper

92edad8

corrected instruction format

3466edb

Samoed mentioned this pull request Jun 5, 2025

model: add kalm_models ModelMeta #2775

Closed

8 tasks

fix implementation

080bdc5

This was referenced Jun 10, 2025

update results for giga-embeddings-instruct embeddings-benchmark/results#208

Merged

Add results for KaLM-Team/KaLM_Embedding-X-0605 in MMTEB embeddings-benchmark/results#217

Closed

Samoed requested a review from KennethEnevoldsen June 13, 2025 12:41

remove if name main

6e31981

KennethEnevoldsen approved these changes Jun 15, 2025

View reviewed changes

KennethEnevoldsen changed the title ~~Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper~~ model: Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper Jun 15, 2025

KennethEnevoldsen changed the title ~~model: Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper~~ model: Updated HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 Jun 15, 2025

KennethEnevoldsen changed the title ~~model: Updated HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1~~ model: Added 3 HIT-TMG's KaLM-embedding models Jun 15, 2025

add comment

fac9630

Samoed merged commit 03e084b into embeddings-benchmark:main Jun 15, 2025
10 checks passed

ayush1298 deleted the KaLM_instruct_v1 branch June 15, 2025 19:11

model: Added 3 HIT-TMG's KaLM-embedding models #2478

model: Added 3 HIT-TMG's KaLM-embedding models #2478

Uh oh!

Conversation

ayush1298 commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding a model checklist

Uh oh!

ayush1298 commented Apr 2, 2025

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ayush1298 commented Apr 3, 2025 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ayush1298 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Apr 3, 2025

Uh oh!

Samoed commented Apr 7, 2025

Uh oh!

ayush1298 commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ayush1298 commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

Samoed commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YanshekWoo commented May 21, 2025

Uh oh!

Samoed commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YanshekWoo commented May 21, 2025

Uh oh!

ayush1298 commented May 24, 2025 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented May 24, 2025

Uh oh!

ayush1298 commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented May 24, 2025

Uh oh!

Samoed commented Jun 7, 2025

Uh oh!

YanshekWoo commented Jun 7, 2025

Uh oh!

Samoed commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YanshekWoo commented Jun 13, 2025

Uh oh!

Samoed commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

ayush1298 commented Apr 2, 2025 •

edited

Loading

ayush1298 commented Apr 3, 2025 •

edited by Samoed

Loading

Samoed commented Apr 3, 2025 •

edited

Loading

ayush1298 commented Apr 3, 2025 •

edited

Loading

ayush1298 commented Apr 8, 2025 •

edited

Loading

Samoed commented May 21, 2025 •

edited

Loading

Samoed commented May 21, 2025 •

edited

Loading

ayush1298 commented May 24, 2025 •

edited by Samoed

Loading

ayush1298 commented May 24, 2025 •

edited

Loading

Samoed commented Jun 8, 2025 •

edited

Loading

Samoed commented Jun 13, 2025 •

edited

Loading

Samoed Jun 15, 2025 •

edited

Loading