Skip to content

Conversation

BaoLocPham
Copy link
Contributor

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. The new models is added in this PR model: Add GreenNode Vietnamese Embedding models mteb#2994
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

{
"dataset_revision": "b48bc27d383cfca5b6a47135a52390fa5f66b253",
"mteb_dataset_name": "AmazonCounterfactualVNClassification",
"mteb_version": null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange that revision is not specified

Comment on lines 9 to 20
"accuracy": 0.7143776824034335,
"accuracy_stderr": 0.0267070410034655,
"ap": 0.31941698387983497,
"ap_stderr": 0.021838724464613213,
"evaluation_time": 4.12,
"f1": 0.6456962245155445,
"f1_stderr": 0.02263704572513824,
"main_score": 0.7143776824034335,
"hf_subset": "default",
"languages": [
"vie-Latn"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is scores for each experiment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of when I'm on the project VN-MTEB, I used the mteb version back in 3,4 months ago, so score for each experiments is not available.
If this was necessary, then I have to run all my benchmark on classification task once again.

Copy link
Member

@Samoed Samoed Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've look to your fork https://github.com/BaoLocPham/vn-mteb/ and it's updated 3 weeks ago and this is fine, but you've submited results with very old mteb version (year or more)

},
"task_name": "VieStudentFeedbackClassification"
"mteb_dataset_name": "VieStudentFeedbackClassification",
"mteb_version": "1.7.52",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very old. Can you update it?

@BaoLocPham
Copy link
Contributor Author

Okay I because all my results come from a depricated version of mteb. I think I better rerun it all over again. This PR will closed and re-open later when I'm done. @Samoed

@BaoLocPham BaoLocPham closed this Aug 18, 2025
@BaoLocPham BaoLocPham reopened this Aug 20, 2025
@Samoed
Copy link
Member

Samoed commented Aug 20, 2025

There is an eeror in ci
KeyError: 'RedditClustering-VN.old' not found. Did you mean: RedditClustering-VN?
Can you remove .old tasks?

@BaoLocPham
Copy link
Contributor Author

There is an eeror in ci KeyError: 'RedditClustering-VN.old' not found. Did you mean: RedditClustering-VN? Can you remove .old tasks?

I really don't know why this is failed even I can't fiind any .old task in my code. Can you specifiy where it located?

Copy link

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: AITeamVN/Vietnamese_Embedding, Alibaba-NLP/gte-Qwen2-1.5B-instruct, Alibaba-NLP/gte-Qwen2-7B-instruct, Alibaba-NLP/gte-multilingual-base, BAAI/bge-m3, BAAI/bge-multilingual-gemma2, GreenNode/GreenNode-Embedding-Large-VN-Mixed-V1, GreenNode/GreenNode-Embedding-Large-VN-V1, VoVanPhuc/sup-SimCSE-VietNamese-phobert-base, bkai-foundation-models/vietnamese-bi-encoder, hiieu/halong_embedding, intfloat/e5-mistral-7b-instruct, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, sentence-transformers/LaBSE, sentence-transformers/all-MiniLM-L12-v2, sentence-transformers/all-MiniLM-L6-v2
Tasks: AmazonCounterfactualVNClassification, AmazonPolarityVNClassification, AmazonReviewsVNClassification, ArguAna-VN, AskUbuntuDupQuestions-VN, BIOSSES-VN, Banking77VNClassification, CQADupstackAndroid-VN, CQADupstackGis-VN, CQADupstackMathematica-VN, CQADupstackPhysics-VN, CQADupstackProgrammers-VN, CQADupstackStats-VN, CQADupstackTex-VN, CQADupstackUnix-VN, CQADupstackWebmasters-VN, CQADupstackWordpress-VN, ClimateFEVER-VN, DBPedia-VN, EmotionVNClassification, FEVER-VN, FiQA2018-VN, GreenNodeTableMarkdownRetrieval, HotpotQA-VN, ImdbVNClassification, MSMARCO-VN, MTOPDomainVNClassification, MTOPIntentVNClassification, MassiveIntentVNClassification, MassiveScenarioVNClassification, NFCorpus-VN, NQ-VN, Quora-VN, RedditClustering-VN, RedditClusteringP2P-VN, SCIDOCS-VN, SICK-R-VN, STSBenchmark-VN, SciDocsRR-VN, SciFact-VN, SprintDuplicateQuestions-VN, StackExchangeClustering-VN, StackExchangeClusteringP2P-VN, StackOverflowDupQuestions-VN, TRECCOVID-VN, Touche2020-VN, ToxicConversationsVNClassification, TweetSentimentExtractionVNClassification, TwentyNewsgroupsClustering-VN, TwitterSemEval2015-VN, TwitterURLCorpus-VN, VieQuADRetrieval, VieStudentFeedbackClassification

Results for AITeamVN/Vietnamese_Embedding

task_name AITeamVN/Vietnamese_Embedding intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.6197 0.6071
AmazonPolarityVNClassification 0.8878 0.7642
AmazonReviewsVNClassification 0.4448 0.3968
ArguAna-VN 0.3793 0.4788
AskUbuntuDupQuestions-VN 0.6187 0.6132
BIOSSES-VN 0.7814 0.8169
Banking77VNClassification 0.7921 0.7374
CQADupstackAndroid-VN 0.4193 0.4228
CQADupstackGis-VN 0.3191 0.3128
CQADupstackMathematica-VN 0.2144 0.2406
CQADupstackPhysics-VN 0.3552 0.3653
CQADupstackProgrammers-VN 0.3271 0.3453
CQADupstackStats-VN 0.2686 0.2781
CQADupstackTex-VN 0.2478 0.2268
CQADupstackUnix-VN 0.3452 0.3362
CQADupstackWebmasters-VN 0.3167 0.3307
CQADupstackWordpress-VN 0.2474 0.2556
ClimateFEVER-VN 0.1325 0.1543
DBPedia-VN 0.3420 0.3158
EmotionVNClassification 0.4453 0.4155
FEVER-VN 0.4881 0.5830
FiQA2018-VN 0.2994 0.3151
GreenNodeTableMarkdownRetrieval 0.3972 0.4263
HotpotQA-VN 0.7007 0.6511
ImdbVNClassification 0.8306 0.7268
MSMARCO-VN 0.3050 0.3908
MTOPDomainVNClassification 0.8553 0.8475
MTOPIntentVNClassification 0.5801 0.5733
MassiveIntentVNClassification 0.6774 0.6573
MassiveScenarioVNClassification 0.7285 0.6832
NFCorpus-VN 0.2539 0.3150
NQ-VN 0.4261 0.5232
Quora-VN 0.6100 0.6649
RedditClustering-VN 0.4389 0.4208
RedditClusteringP2P-VN 0.5616 0.6063
SCIDOCS-VN 0.1303 0.1374
SICK-R-VN 0.7711 0.7822
STSBenchmark-VN 0.7719 0.8203
SciDocsRR-VN 0.8000 0.8199
SciFact-VN 0.5512 0.6850
SprintDuplicateQuestions-VN 0.9568 0.9431
StackExchangeClustering-VN 0.5724 0.5639
StackExchangeClusteringP2P-VN 0.3163 0.3203
StackOverflowDupQuestions-VN 0.5030 0.4888
TRECCOVID-VN 0.2732 0.5471
Touche2020-VN 0.1198 0.1601
ToxicConversationsVNClassification 0.6667 0.6426
TweetSentimentExtractionVNClassification 0.5576 0.5229
TwentyNewsgroupsClustering-VN 0.3933 0.3769
TwitterSemEval2015-VN 0.6824 0.7178
TwitterURLCorpus-VN 0.8499 0.8575
VieQuADRetrieval 0.5564 0.6112 0.5459
VieStudentFeedbackClassification 0.7556 0.7761 0.7728
Average 0.5073 0.5202 0.6593

Results for Alibaba-NLP/gte-Qwen2-1.5B-instruct

task_name Alibaba-NLP/gte-Qwen2-1.5B-instruct intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5543 0.6071
AmazonPolarityVNClassification 0.9069 0.7642
AmazonReviewsVNClassification 0.4432 0.3968
ArguAna-VN 0.5199 0.4788
AskUbuntuDupQuestions-VN 0.6115 0.6132
BIOSSES-VN 0.7939 0.8169
Banking77VNClassification 0.7024 0.7374
CQADupstackAndroid-VN 0.4233 0.4228
CQADupstackGis-VN 0.2813 0.3128
CQADupstackMathematica-VN 0.2446 0.2406
CQADupstackPhysics-VN 0.3718 0.3653
CQADupstackProgrammers-VN 0.3567 0.3453
CQADupstackStats-VN 0.2677 0.2781
CQADupstackTex-VN 0.2375 0.2268
CQADupstackUnix-VN 0.3388 0.3362
CQADupstackWebmasters-VN 0.3230 0.3307
CQADupstackWordpress-VN 0.2534 0.2556
ClimateFEVER-VN 0.2347 0.1543
DBPedia-VN 0.3951 0.3158
EmotionVNClassification 0.4833 0.4155
FEVER-VN 0.8353 0.5830
FiQA2018-VN 0.3427 0.3151
GreenNodeTableMarkdownRetrieval 0.3518 0.4263
HotpotQA-VN 0.6186 0.6511
ImdbVNClassification 0.8493 0.7268
MSMARCO-VN 0.6649 nan
MTOPDomainVNClassification 0.8645 0.8475
MTOPIntentVNClassification 0.6219 0.5733
MassiveIntentVNClassification 0.6927 0.6573
MassiveScenarioVNClassification 0.7257 0.6832
NFCorpus-VN 0.3321 0.3150
NQ-VN 0.5489 0.5232
Quora-VN 0.5211 0.6649
RedditClustering-VN 0.4216 0.4208
RedditClusteringP2P-VN 0.5782 0.6063
SCIDOCS-VN 0.1804 0.1374
SICK-R-VN 0.7707 0.7822
STSBenchmark-VN 0.7946 0.8203
SciDocsRR-VN 0.8464 0.8199
SciFact-VN 0.6967 0.6850
SprintDuplicateQuestions-VN 0.9160 0.9358
StackExchangeClustering-VN 0.6032 0.5639
StackExchangeClusteringP2P-VN 0.4346 0.3203
StackOverflowDupQuestions-VN 0.4839 0.4888
TRECCOVID-VN 0.7846 0.5471
Touche2020-VN 0.3099 0.1601
ToxicConversationsVNClassification 0.6423 0.6426
TweetSentimentExtractionVNClassification 0.5997 0.5229
TwentyNewsgroupsClustering-VN 0.4556 0.3769
TwitterSemEval2015-VN 0.6757 0.7178
TwitterURLCorpus-VN 0.8533 0.8575
VieQuADRetrieval 0.5603 0.6112 0.5459
VieStudentFeedbackClassification 0.7472 0.7761 0.7728
Average 0.5484 0.5226 0.6593

Results for Alibaba-NLP/gte-Qwen2-7B-instruct

task_name Alibaba-NLP/gte-Qwen2-7B-instruct intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5601 0.6071
AmazonPolarityVNClassification 0.9307 0.7642
AmazonReviewsVNClassification 0.4552 0.3968
ArguAna-VN 0.5277 0.4788
AskUbuntuDupQuestions-VN 0.6375 0.6132
BIOSSES-VN 0.7914 0.8169
Banking77VNClassification 0.7356 0.7374
CQADupstackAndroid-VN 0.4836 0.4228
CQADupstackGis-VN 0.3606 0.3128
CQADupstackMathematica-VN 0.2941 0.2406
CQADupstackPhysics-VN 0.4815 0.3653
CQADupstackProgrammers-VN 0.3886 0.3453
CQADupstackStats-VN 0.3459 0.2781
CQADupstackTex-VN 0.2674 0.2268
CQADupstackUnix-VN 0.3926 0.3362
CQADupstackWebmasters-VN 0.3871 0.3307
CQADupstackWordpress-VN 0.3114 0.2556
ClimateFEVER-VN 0.2149 0.1543
DBPedia-VN 0.4189 0.3158
EmotionVNClassification 0.4988 0.4155
FEVER-VN 0.8281 0.5830
FiQA2018-VN 0.4692 0.3151
GreenNodeTableMarkdownRetrieval 0.3659 0.4263
HotpotQA-VN 0.6799 0.6511
ImdbVNClassification 0.9213 0.7268
MSMARCO-VN 0.6899 nan
MTOPDomainVNClassification 0.8854 0.8475
MTOPIntentVNClassification 0.6502 0.5733
MassiveIntentVNClassification 0.7048 0.6573
MassiveScenarioVNClassification 0.7629 0.6832
NFCorpus-VN 0.3827 0.3150
NQ-VN 0.5991 0.5232
Quora-VN 0.5223 0.6649
RedditClustering-VN 0.4547 0.4208
RedditClusteringP2P-VN 0.6182 0.6063
SCIDOCS-VN 0.2095 0.1374
SICK-R-VN 0.7915 0.7822
STSBenchmark-VN 0.8373 0.8203
SciDocsRR-VN 0.8662 0.8199
SciFact-VN 0.7380 0.6850
SprintDuplicateQuestions-VN 0.8389 0.9358
StackExchangeClustering-VN 0.6502 0.5639
StackExchangeClusteringP2P-VN 0.4669 0.3203
StackOverflowDupQuestions-VN 0.5045 0.4888
TRECCOVID-VN 0.7730 0.5471
Touche2020-VN 0.2864 0.1601
ToxicConversationsVNClassification 0.6432 0.6426
TweetSentimentExtractionVNClassification 0.5498 0.5229
TwentyNewsgroupsClustering-VN 0.4469 0.3769
TwitterSemEval2015-VN 0.6551 0.7178
TwitterURLCorpus-VN 0.8526 0.8575
VieQuADRetrieval 0.6133 0.6112 0.5459
VieStudentFeedbackClassification 0.6701 0.7761 0.7728
Average 0.5738 0.5226 0.6593

Results for Alibaba-NLP/gte-multilingual-base

task_name Alibaba-NLP/gte-multilingual-base intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5577 0.6071
AmazonPolarityVNClassification 0.8006 0.7642
AmazonReviewsVNClassification 0.4236 0.3968
ArguAna-VN 0.5275 0.4788
AskUbuntuDupQuestions-VN 0.6150 0.6132
BIOSSES-VN 0.8445 0.8169
Banking77VNClassification 0.7471 0.7374
CQADupstackAndroid-VN 0.3966 0.4228
CQADupstackGis-VN 0.2912 0.3128
CQADupstackMathematica-VN 0.2080 0.2406
CQADupstackPhysics-VN 0.3908 0.3653
CQADupstackProgrammers-VN 0.3419 0.3453
CQADupstackStats-VN 0.2779 0.2781
CQADupstackTex-VN 0.2137 0.2268
CQADupstackUnix-VN 0.3061 0.3362
CQADupstackWebmasters-VN 0.2851 0.3307
CQADupstackWordpress-VN 0.2339 0.2556
ClimateFEVER-VN 0.2105 0.1543
DBPedia-VN 0.3746 0.3158
EmotionVNClassification 0.3798 0.4155
FEVER-VN 0.8624 0.5830
FiQA2018-VN 0.3288 0.3151
HotpotQA-VN 0.5860 0.6511
ImdbVNClassification 0.7531 0.7268
MSMARCO-VN 0.3516 0.3908
MTOPDomainVNClassification 0.8282 0.8475
MTOPIntentVNClassification 0.5094 0.5733
MassiveIntentVNClassification 0.6496 0.6573
MassiveScenarioVNClassification 0.6937 0.6832
NFCorpus-VN 0.3148 0.3150
NQ-VN 0.5065 0.5232
Quora-VN 0.5668 0.6649
RedditClustering-VN 0.4991 0.4208
RedditClusteringP2P-VN 0.5975 0.6063
SCIDOCS-VN 0.1449 0.1374
SICK-R-VN 0.7750 0.7822
STSBenchmark-VN 0.8258 0.8203
SciDocsRR-VN 0.8418 0.8199
SciFact-VN 0.6562 0.6850
SprintDuplicateQuestions-VN 0.9734 0.9431
StackExchangeClustering-VN 0.6080 0.5639
StackExchangeClusteringP2P-VN 0.3523 0.3203
StackOverflowDupQuestions-VN 0.4967 0.4888
TRECCOVID-VN 0.6082 0.5471
Touche2020-VN 0.2269 0.1601
ToxicConversationsVNClassification 0.6618 0.6426
TweetSentimentExtractionVNClassification 0.5216 0.5229
TwentyNewsgroupsClustering-VN 0.4555 0.3769
TwitterSemEval2015-VN 0.7021 0.7178
TwitterURLCorpus-VN 0.8599 0.8575
Average 0.5237 0.5152

Results for BAAI/bge-m3

task_name BAAI/bge-m3 intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5878 0.6071
AmazonPolarityVNClassification 0.8754 0.7642
AmazonReviewsVNClassification 0.4433 0.3968
ArguAna-VN 0.5068 0.4788
AskUbuntuDupQuestions-VN 0.6207 0.6132
BIOSSES-VN 0.7750 0.8169
Banking77VNClassification 0.7810 0.7374
CQADupstackAndroid-VN 0.4404 0.4228
CQADupstackGis-VN 0.3313 0.3128
CQADupstackMathematica-VN 0.2364 0.2406
CQADupstackPhysics-VN 0.3799 0.3653
CQADupstackProgrammers-VN 0.3412 0.3453
CQADupstackStats-VN 0.3013 0.2781
CQADupstackTex-VN 0.2611 0.2268
CQADupstackUnix-VN 0.3567 0.3362
CQADupstackWebmasters-VN 0.3447 0.3307
CQADupstackWordpress-VN 0.2818 0.2556
ClimateFEVER-VN 0.2127 0.1543
DBPedia-VN 0.3670 0.3158
EmotionVNClassification 0.4670 0.4155
FEVER-VN 0.7014 0.5830
FiQA2018-VN 0.3438 0.3151
GreenNodeTableMarkdownRetrieval 0.4037 0.4263
HotpotQA-VN 0.6376 0.6511
ImdbVNClassification 0.8270 0.7268
MSMARCO-VN 0.3622 0.3908
MTOPDomainVNClassification 0.8656 0.8475
MTOPIntentVNClassification 0.5701 0.5733
MassiveIntentVNClassification 0.6818 0.6573
MassiveScenarioVNClassification 0.7275 0.6832
NFCorpus-VN 0.3096 0.3150
NQ-VN 0.5498 0.5232
Quora-VN 0.6457 0.6649
RedditClustering-VN 0.4325 0.4208
RedditClusteringP2P-VN 0.5738 0.6063
SCIDOCS-VN 0.1501 0.1374
SICK-R-VN 0.7788 0.7822
STSBenchmark-VN 0.8115 0.8203
SciDocsRR-VN 0.8174 0.8199
SciFact-VN 0.6231 0.6850
SprintDuplicateQuestions-VN 0.9689 0.9431
StackExchangeClustering-VN 0.5842 0.5639
StackExchangeClusteringP2P-VN 0.3262 0.3203
StackOverflowDupQuestions-VN 0.5029 0.4888
TRECCOVID-VN 0.6622 0.5471
Touche2020-VN 0.2153 0.1601
ToxicConversationsVNClassification 0.6869 0.6426
TweetSentimentExtractionVNClassification 0.5777 0.5229
TwentyNewsgroupsClustering-VN 0.3783 0.3769
TwitterSemEval2015-VN 0.7099 0.7178
TwitterURLCorpus-VN 0.8579 0.8575
VieQuADRetrieval 0.5684 0.6112 0.5459
VieStudentFeedbackClassification 0.8032 0.7761 0.7728
Average 0.5390 0.5202 0.6593

Results for BAAI/bge-multilingual-gemma2

task_name BAAI/bge-multilingual-gemma2 intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.6878 0.6071
AmazonPolarityVNClassification 0.8414 0.7642
AmazonReviewsVNClassification 0.4203 0.3968
ArguAna-VN 0.5061 0.4788
AskUbuntuDupQuestions-VN 0.5475 0.6132
BIOSSES-VN 0.6685 0.8169
Banking77VNClassification 0.8929 0.7374
CQADupstackAndroid-VN 0.3454 0.4228
CQADupstackGis-VN 0.1515 0.3128
CQADupstackMathematica-VN 0.1222 0.2406
CQADupstackPhysics-VN 0.2400 0.3653
CQADupstackProgrammers-VN 0.1915 0.3453
CQADupstackStats-VN 0.1096 0.2781
CQADupstackTex-VN 0.0866 0.2268
CQADupstackUnix-VN 0.2001 0.3362
CQADupstackWebmasters-VN 0.2035 0.3307
CQADupstackWordpress-VN 0.1145 0.2556
ClimateFEVER-VN 0.1652 0.1543
DBPedia-VN 0.0696 0.3158
EmotionVNClassification 0.5023 0.4153
FEVER-VN 0.4523 0.5830
FiQA2018-VN 0.1176 0.3151
HotpotQA-VN 0.2972 0.6511
ImdbVNClassification 0.8151 0.7268
MSMARCO-VN 0.1030 0.3908
MTOPDomainVNClassification 0.9166 0.8475
MTOPIntentVNClassification 0.7572 0.5733
MassiveIntentVNClassification 0.7259 0.6573
MassiveScenarioVNClassification 0.7648 0.6832
NFCorpus-VN 0.1025 0.3150
NQ-VN 0.0971 0.5232
Quora-VN 0.2130 0.6649
RedditClustering-VN 0.2991 0.4208
RedditClusteringP2P-VN 0.5650 0.6063
SCIDOCS-VN 0.0812 0.1374
SICK-R-VN 0.6650 0.7822
STSBenchmark-VN 0.6497 0.8203
SciDocsRR-VN 0.7289 0.8199
SciFact-VN 0.4529 0.6850
SprintDuplicateQuestions-VN 0.9460 0.9431
StackExchangeClustering-VN 0.4883 0.5639
StackExchangeClusteringP2P-VN 0.3299 0.3203
StackOverflowDupQuestions-VN 0.4062 0.4888
TRECCOVID-VN 0.3920 0.5471
Touche2020-VN 0.1105 0.1601
ToxicConversationsVNClassification 0.7319 0.6426
TweetSentimentExtractionVNClassification 0.6113 0.5229
TwentyNewsgroupsClustering-VN 0.3242 0.3769
TwitterSemEval2015-VN 0.6760 0.7178
TwitterURLCorpus-VN 0.8538 0.8575
VieQuADRetrieval 0.1788 0.6112 0.5459
Average 0.4298 0.5170 0.5459

Results for GreenNode/GreenNode-Embedding-Large-VN-Mixed-V1

task_name GreenNode/GreenNode-Embedding-Large-VN-Mixed-V1 intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5897 0.6071
AmazonPolarityVNClassification 0.8704 0.7642
AmazonReviewsVNClassification 0.4434 0.3968
ArguAna-VN 0.3791 0.4788
AskUbuntuDupQuestions-VN 0.6184 0.6132
BIOSSES-VN 0.7615 0.8169
Banking77VNClassification 0.7780 0.7374
CQADupstackAndroid-VN 0.4406 0.4228
CQADupstackGis-VN 0.3289 0.3128
CQADupstackMathematica-VN 0.2337 0.2406
CQADupstackPhysics-VN 0.3831 0.3653
CQADupstackProgrammers-VN 0.3377 0.3453
CQADupstackStats-VN 0.2991 0.2781
CQADupstackTex-VN 0.2576 0.2268
CQADupstackUnix-VN 0.3574 0.3362
CQADupstackWebmasters-VN 0.3484 0.3307
CQADupstackWordpress-VN 0.2723 0.2556
ClimateFEVER-VN 0.2059 0.1543
DBPedia-VN 0.3612 0.3158
EmotionVNClassification 0.4599 0.4155
FEVER-VN 0.6801 0.5830
FiQA2018-VN 0.3327 0.3151
GreenNodeTableMarkdownRetrieval 0.4621 0.4263
HotpotQA-VN 0.6275 0.6511
ImdbVNClassification 0.8237 0.7268
MSMARCO-VN 0.3565 0.3908
MTOPDomainVNClassification 0.8647 0.8475
MTOPIntentVNClassification 0.5560 0.5733
MassiveIntentVNClassification 0.6753 0.6573
MassiveScenarioVNClassification 0.7262 0.6832
NFCorpus-VN 0.3046 0.3150
NQ-VN 0.5390 0.5232
Quora-VN 0.6503 0.6649
RedditClustering-VN 0.4312 0.4208
RedditClusteringP2P-VN 0.5702 0.6063
SCIDOCS-VN 0.1444 0.1374
SICK-R-VN 0.7740 0.7822
STSBenchmark-VN 0.8060 0.8203
SciDocsRR-VN 0.8102 0.8199
SciFact-VN 0.6188 0.6850
SprintDuplicateQuestions-VN 0.9655 0.9431
StackExchangeClustering-VN 0.5910 0.5639
StackExchangeClusteringP2P-VN 0.3229 0.3203
StackOverflowDupQuestions-VN 0.4963 0.4888
TRECCOVID-VN 0.6457 0.5471
Touche2020-VN 0.2106 0.1601
ToxicConversationsVNClassification 0.6821 0.6426
TweetSentimentExtractionVNClassification 0.5828 0.5229
TwentyNewsgroupsClustering-VN 0.3698 0.3769
TwitterSemEval2015-VN 0.7036 0.7178
TwitterURLCorpus-VN 0.8559 0.8575
VieQuADRetrieval 0.5689 0.6112 0.5459
VieStudentFeedbackClassification 0.8002 0.7761 0.7728
Average 0.5335 0.5202 0.6593

Results for GreenNode/GreenNode-Embedding-Large-VN-V1

task_name GreenNode/GreenNode-Embedding-Large-VN-V1 intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5882 0.6071
AmazonPolarityVNClassification 0.8451 0.7642
AmazonReviewsVNClassification 0.4354 0.3968
ArguAna-VN 0.3654 0.4788
AskUbuntuDupQuestions-VN 0.5995 0.6132
BIOSSES-VN 0.7332 0.8169
Banking77VNClassification 0.7593 0.7374
CQADupstackAndroid-VN 0.4155 0.4228
CQADupstackGis-VN 0.3131 0.3128
CQADupstackMathematica-VN 0.2230 0.2406
CQADupstackPhysics-VN 0.3592 0.3653
CQADupstackProgrammers-VN 0.3199 0.3453
CQADupstackStats-VN 0.2833 0.2781
CQADupstackTex-VN 0.2498 0.2268
CQADupstackUnix-VN 0.3337 0.3362
CQADupstackWebmasters-VN 0.3390 0.3307
CQADupstackWordpress-VN 0.2649 0.2556
ClimateFEVER-VN 0.1814 0.1543
DBPedia-VN 0.3227 0.3158
EmotionVNClassification 0.4362 0.4155
FEVER-VN 0.5920 0.5830
FiQA2018-VN 0.2917 0.3151
GreenNodeTableMarkdownRetrieval 0.4669 0.4263
HotpotQA-VN 0.5898 0.6511
ImdbVNClassification 0.7898 0.7268
MSMARCO-VN 0.3298 0.3908
MTOPDomainVNClassification 0.8415 0.8475
MTOPIntentVNClassification 0.5148 0.5733
MassiveIntentVNClassification 0.6555 0.6573
MassiveScenarioVNClassification 0.7026 0.6832
NFCorpus-VN 0.2877 0.3150
NQ-VN 0.4792 0.5232
Quora-VN 0.6409 0.6649
RedditClustering-VN 0.3855 0.4208
RedditClusteringP2P-VN 0.5535 0.6063
SCIDOCS-VN 0.1298 0.1374
SICK-R-VN 0.7558 0.7822
STSBenchmark-VN 0.7809 0.8203
SciDocsRR-VN 0.7832 0.8199
SciFact-VN 0.5992 0.6850
SprintDuplicateQuestions-VN 0.9481 0.9431
StackExchangeClustering-VN 0.5289 0.5639
StackExchangeClusteringP2P-VN 0.3099 0.3203
StackOverflowDupQuestions-VN 0.4856 0.4888
TRECCOVID-VN 0.6174 0.5471
Touche2020-VN 0.1809 0.1601
ToxicConversationsVNClassification 0.6760 0.6426
TweetSentimentExtractionVNClassification 0.5713 0.5229
TwentyNewsgroupsClustering-VN 0.3400 0.3769
TwitterSemEval2015-VN 0.6905 0.7178
TwitterURLCorpus-VN 0.8487 0.8575
VieQuADRetrieval 0.5541 0.6112 0.5459
VieStudentFeedbackClassification 0.7688 0.7761 0.7728
Average 0.5105 0.5202 0.6593

Results for VoVanPhuc/sup-SimCSE-VietNamese-phobert-base

task_name VoVanPhuc/sup-SimCSE-VietNamese-phobert-base intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.6208 0.6071
AmazonPolarityVNClassification 0.7905 0.7642
AmazonReviewsVNClassification 0.3768 0.3968
ArguAna-VN 0.1981 0.4788
AskUbuntuDupQuestions-VN 0.5037 0.6132
BIOSSES-VN 0.5513 0.8169
Banking77VNClassification 0.6905 0.7374
CQADupstackAndroid-VN 0.1656 0.4228
CQADupstackGis-VN 0.0828 0.3128
CQADupstackMathematica-VN 0.0454 0.2406
CQADupstackPhysics-VN 0.1416 0.3653
CQADupstackProgrammers-VN 0.1074 0.3453
CQADupstackStats-VN 0.0730 0.2781
CQADupstackTex-VN 0.0559 0.2268
CQADupstackUnix-VN 0.0882 0.3362
CQADupstackWebmasters-VN 0.1141 0.3307
CQADupstackWordpress-VN 0.0645 0.2556
ClimateFEVER-VN 0.0700 0.1543
DBPedia-VN 0.1116 0.3158
EmotionVNClassification 0.3384 0.4155
FEVER-VN 0.1189 0.5830
FiQA2018-VN 0.0662 0.3151
GreenNodeTableMarkdownRetrieval 0.1322 0.4263
HotpotQA-VN 0.1365 0.6511
ImdbVNClassification 0.6993 0.7268
MSMARCO-VN 0.0499 0.3908
MTOPDomainVNClassification 0.7058 0.8475
MTOPIntentVNClassification 0.4821 0.5733
MassiveIntentVNClassification 0.5794 0.6573
MassiveScenarioVNClassification 0.6076 0.6832
NFCorpus-VN 0.1582 0.3150
NQ-VN 0.0708 0.5232
Quora-VN 0.3233 0.6649
RedditClustering-VN 0.2908 0.4208
RedditClusteringP2P-VN 0.4366 0.6063
SCIDOCS-VN 0.0493 0.1374
SICK-R-VN 0.7446 0.7822
STSBenchmark-VN 0.7624 0.8203
SciDocsRR-VN 0.6865 0.8199
SciFact-VN 0.1967 0.6850
SprintDuplicateQuestions-VN 0.7575 0.9431
StackExchangeClustering-VN 0.3863 0.5639
StackExchangeClusteringP2P-VN 0.2768 0.3203
StackOverflowDupQuestions-VN 0.3555 0.4888
TRECCOVID-VN 0.2123 0.5471
Touche2020-VN 0.1215 0.1601
ToxicConversationsVNClassification 0.6105 0.6426
TweetSentimentExtractionVNClassification 0.5347 0.5229
TwentyNewsgroupsClustering-VN 0.2620 0.3769
TwitterSemEval2015-VN 0.6359 0.7178
TwitterURLCorpus-VN 0.8131 0.8575
VieQuADRetrieval 0.3009 0.6112 0.5459
VieStudentFeedbackClassification 0.7076 0.7761 0.7728
Average 0.3483 0.5202 0.6593

Results for bkai-foundation-models/vietnamese-bi-encoder

task_name bkai-foundation-models/vietnamese-bi-encoder intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5695 0.6071
AmazonPolarityVNClassification 0.6652 0.7642
AmazonReviewsVNClassification 0.3279 0.3968
ArguAna-VN 0.2865 0.4788
AskUbuntuDupQuestions-VN 0.5616 0.6132
BIOSSES-VN 0.6613 0.8169
Banking77VNClassification 0.7551 0.7374
CQADupstackAndroid-VN 0.2673 0.4228
CQADupstackGis-VN 0.1780 0.3128
CQADupstackMathematica-VN 0.1319 0.2406
CQADupstackPhysics-VN 0.2619 0.3653
CQADupstackProgrammers-VN 0.2042 0.3453
CQADupstackStats-VN 0.1864 0.2781
CQADupstackTex-VN 0.1099 0.2268
CQADupstackUnix-VN 0.1948 0.3362
CQADupstackWebmasters-VN 0.2139 0.3307
CQADupstackWordpress-VN 0.1621 0.2556
ClimateFEVER-VN 0.1114 0.1543
DBPedia-VN 0.2022 0.3158
EmotionVNClassification 0.3153 0.4155
FEVER-VN 0.5311 0.5830
FiQA2018-VN 0.1729 0.3151
GreenNodeTableMarkdownRetrieval 0.1562 0.4263
HotpotQA-VN 0.3448 0.6511
ImdbVNClassification 0.5901 0.7268
MSMARCO-VN 0.3013 0.3908
MTOPDomainVNClassification 0.7935 0.8475
MTOPIntentVNClassification 0.5584 0.5733
MassiveIntentVNClassification 0.6229 0.6573
MassiveScenarioVNClassification 0.6536 0.6832
NFCorpus-VN 0.2338 0.3150
NQ-VN 0.3063 0.5232
Quora-VN 0.3755 0.6649
RedditClustering-VN 0.2860 0.4208
RedditClusteringP2P-VN 0.4382 0.6063
SCIDOCS-VN 0.0917 0.1374
SICK-R-VN 0.6965 0.7822
STSBenchmark-VN 0.6997 0.8203
SciDocsRR-VN 0.7187 0.8199
SciFact-VN 0.3929 0.6850
SprintDuplicateQuestions-VN 0.9177 0.9431
StackExchangeClustering-VN 0.4431 0.5639
StackExchangeClusteringP2P-VN 0.2779 0.3203
StackOverflowDupQuestions-VN 0.4370 0.4888
TRECCOVID-VN 0.5457 0.5471
Touche2020-VN 0.1880 0.1601
ToxicConversationsVNClassification 0.6190 0.6426
TweetSentimentExtractionVNClassification 0.4696 0.5229
TwentyNewsgroupsClustering-VN 0.2612 0.3769
TwitterSemEval2015-VN 0.6435 0.7178
TwitterURLCorpus-VN 0.8278 0.8575
VieQuADRetrieval 0.4347 0.6112 0.5459
VieStudentFeedbackClassification 0.6772 0.7761 0.7728
Average 0.4165 0.5202 0.6593

Results for hiieu/halong_embedding

task_name hiieu/halong_embedding intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5603 0.6071
AmazonPolarityVNClassification 0.6999 0.7642
AmazonReviewsVNClassification 0.3636 0.3968
ArguAna-VN 0.3858 0.4788
AskUbuntuDupQuestions-VN 0.5925 0.6132
BIOSSES-VN 0.8020 0.8169
Banking77VNClassification 0.7502 0.7374
CQADupstackAndroid-VN 0.4212 0.4228
CQADupstackGis-VN 0.3076 0.3128
CQADupstackMathematica-VN 0.2185 0.2406
CQADupstackPhysics-VN 0.3689 0.3653
CQADupstackProgrammers-VN 0.3285 0.3453
CQADupstackStats-VN 0.2857 0.2781
CQADupstackTex-VN 0.2397 0.2268
CQADupstackUnix-VN 0.3265 0.3362
CQADupstackWebmasters-VN 0.3260 0.3307
CQADupstackWordpress-VN 0.2478 0.2556
ClimateFEVER-VN 0.1448 0.1543
DBPedia-VN 0.2381 0.3158
EmotionVNClassification 0.4241 0.4155
FEVER-VN 0.5287 0.5830
FiQA2018-VN 0.2623 0.3151
GreenNodeTableMarkdownRetrieval 0.3597 0.4263
HotpotQA-VN 0.5336 0.6511
ImdbVNClassification 0.6520 0.7268
MSMARCO-VN 0.2975 0.3908
MTOPDomainVNClassification 0.8429 0.8475
MTOPIntentVNClassification 0.5399 0.5733
MassiveIntentVNClassification 0.6540 0.6573
MassiveScenarioVNClassification 0.7088 0.6832
NFCorpus-VN 0.2703 0.3150
NQ-VN 0.3615 0.5232
Quora-VN 0.5879 0.6649
RedditClustering-VN 0.3831 0.4208
RedditClusteringP2P-VN 0.5567 0.6063
SCIDOCS-VN 0.1335 0.1374
SICK-R-VN 0.7400 0.7822
STSBenchmark-VN 0.7796 0.8203
SciDocsRR-VN 0.8054 0.8199
SciFact-VN 0.6087 0.6850
SprintDuplicateQuestions-VN 0.9577 0.9431
StackExchangeClustering-VN 0.5526 0.5639
StackExchangeClusteringP2P-VN 0.3195 0.3203
StackOverflowDupQuestions-VN 0.4953 0.4888
TRECCOVID-VN 0.5473 0.5471
Touche2020-VN 0.1588 0.1601
ToxicConversationsVNClassification 0.6432 0.6426
TweetSentimentExtractionVNClassification 0.5172 0.5229
TwentyNewsgroupsClustering-VN 0.3592 0.3769
TwitterSemEval2015-VN 0.6402 0.7178
TwitterURLCorpus-VN 0.8438 0.8575
VieQuADRetrieval 0.5201 0.6112 0.5459
VieStudentFeedbackClassification 0.6965 0.7761 0.7728
Average 0.4885 0.5202 0.6593

Results for intfloat/e5-mistral-7b-instruct

task_name intfloat/e5-mistral-7b-instruct intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5270 0.6071
AmazonPolarityVNClassification 0.9057 0.7642
AmazonReviewsVNClassification 0.4508 0.3968
ArguAna-VN 0.5036 0.4788
AskUbuntuDupQuestions-VN 0.6265 0.6132
BIOSSES-VN 0.8372 0.8169
Banking77VNClassification 0.5792 0.7374
CQADupstackAndroid-VN 0.4682 0.4228
CQADupstackGis-VN 0.3518 0.3128
CQADupstackMathematica-VN 0.2526 0.2406
CQADupstackPhysics-VN 0.3817 0.3653
CQADupstackProgrammers-VN 0.4042 0.3453
CQADupstackStats-VN 0.2955 0.2781
CQADupstackTex-VN 0.2810 0.2268
CQADupstackUnix-VN 0.3994 0.3362
CQADupstackWebmasters-VN 0.3859 0.3307
CQADupstackWordpress-VN 0.3162 0.2556
ClimateFEVER-VN 0.2477 0.1543
DBPedia-VN 0.4279 0.3158
EmotionVNClassification 0.4604 0.4155
FEVER-VN 0.8482 0.5830
FiQA2018-VN 0.3039 0.3151
GreenNodeTableMarkdownRetrieval 0.3502 0.4263
HotpotQA-VN 0.6454 0.6511
ImdbVNClassification 0.8654 0.7268
MSMARCO-VN 0.3524 0.3908
MTOPDomainVNClassification 0.8203 0.8475
MTOPIntentVNClassification 0.4354 0.5733
MassiveIntentVNClassification 0.6115 0.6573
MassiveScenarioVNClassification 0.6838 0.6832
NFCorpus-VN 0.3197 0.3150
NQ-VN 0.5780 0.5232
Quora-VN 0.4287 0.6649
RedditClustering-VN 0.4228 0.4208
RedditClusteringP2P-VN 0.5681 0.6063
SCIDOCS-VN 0.1523 0.1374
SICK-R-VN 0.7791 0.7822
STSBenchmark-VN 0.8198 0.8203
SciDocsRR-VN 0.8380 0.8199
SciFact-VN 0.6377 0.6850
SprintDuplicateQuestions-VN 0.9247 0.9358
StackExchangeClustering-VN 0.5891 0.5639
StackExchangeClusteringP2P-VN 0.4146 0.3203
StackOverflowDupQuestions-VN 0.5172 0.4888
TRECCOVID-VN 0.7742 0.5471
Touche2020-VN 0.2592 0.1601
ToxicConversationsVNClassification 0.5624 0.6426
TweetSentimentExtractionVNClassification 0.5735 0.5229
TwentyNewsgroupsClustering-VN 0.4556 0.3769
TwitterSemEval2015-VN 0.7332 0.7178
TwitterURLCorpus-VN 0.8698 0.8575
VieQuADRetrieval 0.5370 0.6112 0.5459
VieStudentFeedbackClassification 0.7123 0.7761 0.7728
Average 0.5375 0.5201 0.6593

Results for intfloat/multilingual-e5-base

task_name intfloat/multilingual-e5-base intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.5612 0.6071
AmazonPolarityVNClassification 0.7591 0.7642
AmazonReviewsVNClassification 0.4031 0.3968
ArguAna-VN 0.4549 0.4788
AskUbuntuDupQuestions-VN 0.5860 0.6132
BIOSSES-VN 0.8182 0.8169
Banking77VNClassification 0.7096 0.7374
CQADupstackAndroid-VN 0.4235 0.4228
CQADupstackGis-VN 0.2861 0.3128
CQADupstackMathematica-VN 0.2133 0.2406
CQADupstackPhysics-VN 0.3515 0.3653
CQADupstackProgrammers-VN 0.3191 0.3453
CQADupstackStats-VN 0.2581 0.2781
CQADupstackTex-VN 0.2078 0.2268
CQADupstackUnix-VN 0.3294 0.3362
CQADupstackWebmasters-VN 0.3104 0.3307
CQADupstackWordpress-VN 0.2387 0.2556
ClimateFEVER-VN 0.1262 0.1543
DBPedia-VN 0.3077 0.3158
EmotionVNClassification 0.4096 0.4155
FEVER-VN 0.4960 0.5830
FiQA2018-VN 0.2514 0.3151
GreenNodeTableMarkdownRetrieval 0.3889 0.4263
HotpotQA-VN 0.6079 0.6511
ImdbVNClassification 0.6851 0.7268
MSMARCO-VN 0.3619 0.3908
MTOPDomainVNClassification 0.8398 0.8475
MTOPIntentVNClassification 0.5201 0.5733
MassiveIntentVNClassification 0.6302 0.6573
MassiveScenarioVNClassification 0.6724 0.6832
NFCorpus-VN 0.2675 0.3150
NQ-VN 0.4510 0.5232
Quora-VN 0.6329 0.6649
RedditClustering-VN 0.4259 0.4208
RedditClusteringP2P-VN 0.5834 0.6063
SCIDOCS-VN 0.1290 0.1374
SICK-R-VN 0.7677 0.7822
STSBenchmark-VN 0.7977 0.8203
SciDocsRR-VN 0.8055 0.8199
SciFact-VN 0.6761 0.6850
SprintDuplicateQuestions-VN 0.9407 0.9431
StackExchangeClustering-VN 0.5715 0.5639
StackExchangeClusteringP2P-VN 0.3221 0.3203
StackOverflowDupQuestions-VN 0.4827 0.4888
TRECCOVID-VN 0.4486 0.5471
Touche2020-VN 0.1313 0.1601
ToxicConversationsVNClassification 0.6537 0.6426
TweetSentimentExtractionVNClassification 0.5269 0.5229
TwentyNewsgroupsClustering-VN 0.3820 0.3769
TwitterSemEval2015-VN 0.6876 0.7178
TwitterURLCorpus-VN 0.8561 0.8575
VieQuADRetrieval 0.5765 0.6112 0.5459
VieStudentFeedbackClassification 0.7486 0.7761 0.7728
Average 0.4980 0.5202 0.6593

Results for intfloat/multilingual-e5-large-instruct

task_name intfloat/multilingual-e5-large intfloat/multilingual-e5-large-instruct Max result
AmazonCounterfactualVNClassification 0.6071 0.5618
AmazonPolarityVNClassification 0.7642 0.9274
AmazonReviewsVNClassification 0.3968 0.4934
ArguAna-VN 0.4788 0.4815
AskUbuntuDupQuestions-VN 0.6132 0.6228
BIOSSES-VN 0.8169 0.8412
Banking77VNClassification 0.7374 0.7323
CQADupstackAndroid-VN 0.4228 0.4313
CQADupstackGis-VN 0.3128 0.3073
CQADupstackMathematica-VN 0.2406 0.2231
CQADupstackPhysics-VN 0.3653 0.3570
CQADupstackProgrammers-VN 0.3453 0.3674
CQADupstackStats-VN 0.2781 0.2619
CQADupstackTex-VN 0.2268 0.2339
CQADupstackUnix-VN 0.3362 0.3298
CQADupstackWebmasters-VN 0.3307 0.3385
CQADupstackWordpress-VN 0.2556 0.2530
ClimateFEVER-VN 0.1543 0.2501
DBPedia-VN 0.3158 0.3990
EmotionVNClassification 0.4155 0.5052
FEVER-VN 0.5830 0.8334
FiQA2018-VN 0.3151 0.3646
GreenNodeTableMarkdownRetrieval 0.4263 0.3987
HotpotQA-VN 0.6511 0.6399
ImdbVNClassification 0.7268 0.8831
MSMARCO-VN 0.3908 0.3786
MTOPDomainVNClassification 0.8475 0.8301
MTOPIntentVNClassification 0.5733 0.5039
MassiveIntentVNClassification 0.6573 0.6316
MassiveScenarioVNClassification 0.6832 0.6845
NFCorpus-VN 0.3150 0.3340
NQ-VN 0.5232 0.5686
Quora-VN 0.6649 0.5790
RedditClustering-VN 0.4208 0.4646
RedditClusteringP2P-VN 0.6063 0.6087
SCIDOCS-VN 0.1374 0.1681
SICK-R-VN 0.7822 0.8032
STSBenchmark-VN 0.8203 0.8428
SciDocsRR-VN 0.8199 0.8586
SciFact-VN 0.6850 0.6552
SprintDuplicateQuestions-VN 0.9358 0.9027
StackExchangeClustering-VN 0.5639 0.6252
StackExchangeClusteringP2P-VN 0.3203 0.4020
StackOverflowDupQuestions-VN 0.4888 0.5093
TRECCOVID-VN 0.5471 0.8056
Touche2020-VN 0.1601 0.2503
ToxicConversationsVNClassification 0.6426 0.6342
TweetSentimentExtractionVNClassification 0.5229 0.5861
TwentyNewsgroupsClustering-VN 0.3769 0.4730
TwitterSemEval2015-VN 0.7178 0.7610
TwitterURLCorpus-VN 0.8575 0.8703
VieQuADRetrieval 0.6112 0.5536 0.5459
VieStudentFeedbackClassification 0.7761 0.7912 0.7728
Average 0.5201 0.5493 0.6593

Results for intfloat/multilingual-e5-large

task_name intfloat/multilingual-e5-large Max result
AmazonCounterfactualVNClassification 0.6071
AmazonPolarityVNClassification 0.7642
AmazonReviewsVNClassification 0.3968
ArguAna-VN 0.4788
AskUbuntuDupQuestions-VN 0.6132
BIOSSES-VN 0.8169
Banking77VNClassification 0.7374
CQADupstackAndroid-VN 0.4228
CQADupstackGis-VN 0.3128
CQADupstackMathematica-VN 0.2406
CQADupstackPhysics-VN 0.3653
CQADupstackProgrammers-VN 0.3453
CQADupstackStats-VN 0.2781
CQADupstackTex-VN 0.2268
CQADupstackUnix-VN 0.3362
CQADupstackWebmasters-VN 0.3307
CQADupstackWordpress-VN 0.2556
ClimateFEVER-VN 0.1543
DBPedia-VN 0.3158
EmotionVNClassification 0.4155
FEVER-VN 0.5830
FiQA2018-VN 0.3151
GreenNodeTableMarkdownRetrieval 0.4263
HotpotQA-VN 0.6511
ImdbVNClassification 0.7268
MSMARCO-VN 0.3908
MTOPDomainVNClassification 0.8475
MTOPIntentVNClassification 0.5733
MassiveIntentVNClassification 0.6573
MassiveScenarioVNClassification 0.6832
NFCorpus-VN 0.3150
NQ-VN 0.5232
Quora-VN 0.6649
RedditClustering-VN 0.4208
RedditClusteringP2P-VN 0.6063
SCIDOCS-VN 0.1374
SICK-R-VN 0.7822
STSBenchmark-VN 0.8203
SciDocsRR-VN 0.8199
SciFact-VN 0.6850
SprintDuplicateQuestions-VN 0.9431
StackExchangeClustering-VN 0.5639
StackExchangeClusteringP2P-VN 0.3203
StackOverflowDupQuestions-VN 0.4888
TRECCOVID-VN 0.5471
Touche2020-VN 0.1601
ToxicConversationsVNClassification 0.6426
TweetSentimentExtractionVNClassification 0.5229
TwentyNewsgroupsClustering-VN 0.3769
TwitterSemEval2015-VN 0.7178
TwitterURLCorpus-VN 0.8575
VieQuADRetrieval 0.6112 0.5459
VieStudentFeedbackClassification 0.7761 0.7728
Average 0.5202 0.6593

Results for intfloat/multilingual-e5-small

task_name intfloat/multilingual-e5-large intfloat/multilingual-e5-small Max result
AmazonCounterfactualVNClassification 0.6071 0.5650
AmazonPolarityVNClassification 0.7642 0.7486
AmazonReviewsVNClassification 0.3968 0.3855
ArguAna-VN 0.4788 0.4297
AskUbuntuDupQuestions-VN 0.6132 0.5713
BIOSSES-VN 0.8169 0.7908
Banking77VNClassification 0.7374 0.6714
CQADupstackAndroid-VN 0.4228 0.4169
CQADupstackGis-VN 0.3128 0.2912
CQADupstackMathematica-VN 0.2406 0.1933
CQADupstackPhysics-VN 0.3653 0.3696
CQADupstackProgrammers-VN 0.3453 0.3142
CQADupstackStats-VN 0.2781 0.2651
CQADupstackTex-VN 0.2268 0.2208
CQADupstackUnix-VN 0.3362 0.3112
CQADupstackWebmasters-VN 0.3307 0.3058
CQADupstackWordpress-VN 0.2556 0.2339
ClimateFEVER-VN 0.1543 0.1513
DBPedia-VN 0.3158 0.2854
EmotionVNClassification 0.4155 0.3651
FEVER-VN 0.5830 0.5425
FiQA2018-VN 0.3151 0.2271
GreenNodeTableMarkdownRetrieval 0.4263 0.3920
HotpotQA-VN 0.6511 0.5499
ImdbVNClassification 0.7268 0.6677
MSMARCO-VN 0.3908 0.3311
MTOPDomainVNClassification 0.8475 0.7935
MTOPIntentVNClassification 0.5733 0.4550
MassiveIntentVNClassification 0.6573 0.6006
MassiveScenarioVNClassification 0.6832 0.6438
NFCorpus-VN 0.3150 0.2728
NQ-VN 0.5232 0.3854
Quora-VN 0.6649 0.6047
RedditClustering-VN 0.4208 0.3774
RedditClusteringP2P-VN 0.6063 0.5639
SCIDOCS-VN 0.1374 0.1171
SICK-R-VN 0.7822 0.7549
STSBenchmark-VN 0.8203 0.7809
SciDocsRR-VN 0.8199 0.7907
SciFact-VN 0.6850 0.6578
SprintDuplicateQuestions-VN 0.9431 0.9280
StackExchangeClustering-VN 0.5639 0.5488
StackExchangeClusteringP2P-VN 0.3203 0.3250
StackOverflowDupQuestions-VN 0.4888 0.4604
TRECCOVID-VN 0.5471 0.5357
Touche2020-VN 0.1601 0.1774
ToxicConversationsVNClassification 0.6426 0.6250
TweetSentimentExtractionVNClassification 0.5229 0.5271
TwentyNewsgroupsClustering-VN 0.3769 0.3431
TwitterSemEval2015-VN 0.7178 0.6747
TwitterURLCorpus-VN 0.8575 0.8466
VieQuADRetrieval 0.6112 0.5527 0.5459
VieStudentFeedbackClassification 0.7761 0.7378 0.7728
Average 0.5202 0.4845 0.6593

Results for sentence-transformers/LaBSE

task_name intfloat/multilingual-e5-large sentence-transformers/LaBSE Max result
AmazonCounterfactualVNClassification 0.6071 0.6382
AmazonPolarityVNClassification 0.7642 0.7039
AmazonReviewsVNClassification 0.3968 0.3637
ArguAna-VN 0.4788 0.3646
AskUbuntuDupQuestions-VN 0.6132 0.5549
BIOSSES-VN 0.8169 0.7677
Banking77VNClassification 0.7374 0.6711
CQADupstackAndroid-VN 0.4228 0.2847
CQADupstackGis-VN 0.3128 0.1724
CQADupstackMathematica-VN 0.2406 0.1285
CQADupstackPhysics-VN 0.3653 0.2119
CQADupstackProgrammers-VN 0.3453 0.1851
CQADupstackStats-VN 0.2781 0.1508
CQADupstackTex-VN 0.2268 0.1273
CQADupstackUnix-VN 0.3362 0.2250
CQADupstackWebmasters-VN 0.3307 0.2078
CQADupstackWordpress-VN 0.2556 0.1405
ClimateFEVER-VN 0.1543 0.0241
DBPedia-VN 0.3158 0.1592
EmotionVNClassification 0.4155 0.3479
FEVER-VN 0.5830 0.1258
FiQA2018-VN 0.3151 0.0738
GreenNodeTableMarkdownRetrieval 0.4263 0.2173
HotpotQA-VN 0.6511 0.1700
ImdbVNClassification 0.7268 0.6259
MSMARCO-VN 0.3908 0.1012
MTOPDomainVNClassification 0.8475 0.7972
MTOPIntentVNClassification 0.5733 0.5323
MassiveIntentVNClassification 0.6573 0.6059
MassiveScenarioVNClassification 0.6832 0.6410
NFCorpus-VN 0.3150 0.2050
NQ-VN 0.5232 0.1170
Quora-VN 0.6649 0.3817
RedditClustering-VN 0.4208 0.2829
RedditClusteringP2P-VN 0.6063 0.5020
SCIDOCS-VN 0.1374 0.0836
SICK-R-VN 0.7822 0.6877
STSBenchmark-VN 0.8203 0.7058
SciDocsRR-VN 0.8199 0.7529
SciFact-VN 0.6850 0.4149
SprintDuplicateQuestions-VN 0.9431 0.8038
StackExchangeClustering-VN 0.5639 0.3859
StackExchangeClusteringP2P-VN 0.3203 0.2787
StackOverflowDupQuestions-VN 0.4888 0.4399
TRECCOVID-VN 0.5471 0.1696
Touche2020-VN 0.1601 0.0393
ToxicConversationsVNClassification 0.6426 0.6193
TweetSentimentExtractionVNClassification 0.5229 0.4919
TwentyNewsgroupsClustering-VN 0.3769 0.2811
TwitterSemEval2015-VN 0.7178 0.6526
TwitterURLCorpus-VN 0.8575 0.8491
VieQuADRetrieval 0.6112 0.2824 0.5459
VieStudentFeedbackClassification 0.7761 0.6897 0.7728
Average 0.5202 0.3856 0.6593

Results for sentence-transformers/all-MiniLM-L12-v2

task_name intfloat/multilingual-e5-large sentence-transformers/all-MiniLM-L12-v2 Max result
AmazonCounterfactualVNClassification 0.6071 0.5736
AmazonPolarityVNClassification 0.7642 0.554
AmazonReviewsVNClassification 0.3968 0.2722
ArguAna-VN 0.4788 0.0989
AskUbuntuDupQuestions-VN 0.6132 0.5364
BIOSSES-VN 0.8169 0.6414
Banking77VNClassification 0.7374 0.508
CQADupstackAndroid-VN 0.4228 0.2084
CQADupstackGis-VN 0.3128 0.1381
CQADupstackMathematica-VN 0.2406 0.0972
CQADupstackPhysics-VN 0.3653 0.1454
CQADupstackProgrammers-VN 0.3453 0.1472
CQADupstackStats-VN 0.2781 0.148
CQADupstackTex-VN 0.2268 0.1053
CQADupstackUnix-VN 0.3362 0.1644
CQADupstackWebmasters-VN 0.3307 0.1651
CQADupstackWordpress-VN 0.2556 0.1323
ClimateFEVER-VN 0.1543 0.0163
DBPedia-VN 0.3158 0.1481
EmotionVNClassification 0.4155 0.1963
FEVER-VN 0.5830 0.294
FiQA2018-VN 0.3151 0.0431
GreenNodeTableMarkdownRetrieval 0.4263 0.0567
HotpotQA-VN 0.6511 0.1716
ImdbVNClassification 0.7268 0.5153
MSMARCO-VN 0.3908 0.0941
MTOPDomainVNClassification 0.8475 0.589
MTOPIntentVNClassification 0.5733 0.3043
MassiveIntentVNClassification 0.6573 0.4239
MassiveScenarioVNClassification 0.6832 0.4784
NFCorpus-VN 0.3150 0.1732
NQ-VN 0.5232 0.1079
Quora-VN 0.6649 0.2655
RedditClustering-VN 0.4208 0.1797
RedditClusteringP2P-VN 0.6063 0.3145
SCIDOCS-VN 0.1374 0.0539
SICK-R-VN 0.7822 0.6192
STSBenchmark-VN 0.8203 0.6094
SciDocsRR-VN 0.8199 0.6415
SciFact-VN 0.6850 0.2675
SprintDuplicateQuestions-VN 0.9431 0.7795
StackExchangeClustering-VN 0.5639 0.2191
StackExchangeClusteringP2P-VN 0.3203 0.2949
StackOverflowDupQuestions-VN 0.4888 0.392
TRECCOVID-VN 0.5471 0.1811
Touche2020-VN 0.1601 0.0268
ToxicConversationsVNClassification 0.6426 0.517
TweetSentimentExtractionVNClassification 0.5229 0.377
TwentyNewsgroupsClustering-VN 0.3769 0.2098
TwitterSemEval2015-VN 0.7178 0.502
TwitterURLCorpus-VN 0.8575 0.7745
VieQuADRetrieval 0.6112 0.1528 0.5459
VieStudentFeedbackClassification 0.7761 0.544 0.7728
Average 0.5202 0.3051 0.6593

Results for sentence-transformers/all-MiniLM-L6-v2

task_name intfloat/multilingual-e5-large sentence-transformers/all-MiniLM-L6-v2 Max result
AmazonCounterfactualVNClassification 0.6071 0.5612
AmazonPolarityVNClassification 0.7642 0.5619
AmazonReviewsVNClassification 0.3968 0.2699
ArguAna-VN 0.4788 0.0767
AskUbuntuDupQuestions-VN 0.6132 0.5352
BIOSSES-VN 0.8169 0.5609
Banking77VNClassification 0.7374 0.4894
CQADupstackAndroid-VN 0.4228 0.1725
CQADupstackGis-VN 0.3128 0.0919
CQADupstackMathematica-VN 0.2406 0.0662
CQADupstackPhysics-VN 0.3653 0.1019
CQADupstackProgrammers-VN 0.3453 0.0777
CQADupstackStats-VN 0.2781 0.0747
CQADupstackTex-VN 0.2268 0.0607
CQADupstackUnix-VN 0.3362 0.1178
CQADupstackWebmasters-VN 0.3307 0.0956
CQADupstackWordpress-VN 0.2556 0.0854
ClimateFEVER-VN 0.1543 0.004
DBPedia-VN 0.3158 0.109
EmotionVNClassification 0.4155 0.1934
FEVER-VN 0.5830 0.1235
FiQA2018-VN 0.3151 0.0144
GreenNodeTableMarkdownRetrieval 0.4263 0.0385
HotpotQA-VN 0.6511 0.1331
ImdbVNClassification 0.7268 0.5424
MSMARCO-VN 0.3908 0.0816
MTOPDomainVNClassification 0.8475 0.5641
MTOPIntentVNClassification 0.5733 0.2993
MassiveIntentVNClassification 0.6573 0.4147
MassiveScenarioVNClassification 0.6832 0.459
NFCorpus-VN 0.3150 0.1405
NQ-VN 0.5232 0.0744
Quora-VN 0.6649 0.2043
RedditClustering-VN 0.4208 0.1323
RedditClusteringP2P-VN 0.6063 0.2761
SCIDOCS-VN 0.1374 0.0361
SICK-R-VN 0.7822 0.6205
STSBenchmark-VN 0.8203 0.5662
SciDocsRR-VN 0.8199 0.6193
SciFact-VN 0.6850 0.2093
SprintDuplicateQuestions-VN 0.9431 0.648
StackExchangeClustering-VN 0.5639 0.1662
StackExchangeClusteringP2P-VN 0.3203 0.2489
StackOverflowDupQuestions-VN 0.4888 0.3795
TRECCOVID-VN 0.5471 0.1175
Touche2020-VN 0.1601 0.0266
ToxicConversationsVNClassification 0.6426 0.5023
TweetSentimentExtractionVNClassification 0.5229 0.3755
TwentyNewsgroupsClustering-VN 0.3769 0.1966
TwitterSemEval2015-VN 0.7178 0.5233
TwitterURLCorpus-VN 0.8575 0.7646
VieQuADRetrieval 0.6112 0.0923 0.5459
VieStudentFeedbackClassification 0.7761 0.5564 0.7728
Average 0.5202 0.2727 0.6593

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Aug 20, 2025

The current PR looks good on my end - thanks for the PR!

@KennethEnevoldsen KennethEnevoldsen merged commit c0fd327 into embeddings-benchmark:main Aug 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants