add geoembedding results #215

Hypothesis-Z · 2025-06-05T06:04:27Z

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR model: Add GeoGPT-Research-Project/GeoEmbedding mteb#2773
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

KennethEnevoldsen · 2025-06-09T10:24:52Z

Results for GeoGPT-Research-Project/GeoEmbedding

task_name	GeoGPT-Research-Project/GeoEmbedding	google/gemini-embedding-001	intfloat/multilingual-e5-large
AmazonCounterfactualClassification	0.97	0.88	0.7
ArXivHierarchicalClusteringP2P	0.65	0.65	0.56
ArXivHierarchicalClusteringS2S	0.64	0.64	0.54
ArguAna	0.78	0.86	0.54
AskUbuntuDupQuestions	0.65	0.64	0.59
BIOSSES	0.84	0.89	0.85
Banking77Classification	0.92	0.94	0.75
BiorxivClusteringP2P.v2	0.48	0.54	0.37
CQADupstackGamingRetrieval	0.65	0.71	0.59
CQADupstackUnixRetrieval	0.50	0.54	0.4
ClimateFEVERHardNegatives	0.43	0.31	0.26
FEVERHardNegatives	0.93	0.89	0.84
FiQA2018	0.53	0.62	0.44
HotpotQAHardNegatives	0.73	0.87	0.71
ImdbClassification	0.92	0.95	0.89
MTOPDomainClassification	0.98	0.98	0.9
MassiveIntentClassification	0.86	0.82	0.6
MassiveScenarioClassification	0.90	0.87	0.7
MedrxivClusteringP2P.v2	0.46	0.47	0.34
MedrxivClusteringS2S.v2	0.48	0.45	0.32
MindSmallReranking	0.32	0.33	0.3
SCIDOCS	0.22	0.25	0.17
SICK-R	0.80	0.83	0.8
STS12	0.68	0.82	0.8
STS13	0.82	0.90	0.82
STS14	0.78	0.85	0.78
STS15	0.87	0.90	0.89
STS17	0.90	0.89	0.82
STS22.v2	0.72	0.72	0.64
STSBenchmark	0.84	0.89	0.87
SprintDuplicateQuestions	0.94	0.97	0.93
StackExchangeClustering.v2	0.54	0.92	0.46
StackExchangeClusteringP2P.v2	0.40	0.51	0.39
SummEvalSummarization.v2	0.30	0.38	0.31
TRECCOVID	0.77	0.86	0.71
Touche2020Retrieval.v3	0.54	0.52	0.5
ToxicConversationsClassification	0.85	0.89	0.66
TweetSentimentExtractionClassification	0.77	0.70	0.63
TwentyNewsgroupsClustering.v2	0.88	0.57	0.39
TwitterSemEval2015	0.68	0.79	0.75
TwitterURLCorpus	0.86	0.87	0.86
Average	0.70	0.73	0.62

Noteworthy scores include TwentyNewsgroupsClustering.v2, TweetSentimentExtractionClassification, ClimateFEVERHardNegatives, AmazonCounterfactualClassification, FEVERHardNegatives

Double checked these with the leaderboard, where the following looks concerning:

TwentyNewsgroupsClustering.v2: highest is .68
AmazonCounterfactualClassification: Highest is ~.93

AmazonCounterfactualClassification is partly explained by training on the dataset. @Hypothesis-Z can you help me understand TwentyNewsgroupsClustering.v2?

Hypothesis-Z · 2025-06-10T02:45:30Z

Hi @KennethEnevoldsen, thank you for double check in fine detail.

I have checked the training datasets and the model metadata, and the training datasets of MTEB classification and clustering tasks include:

Training Datasets:
- amazoncounterfactualclassification
- amazonpolarityclassification
- amazonreviewsclassification
- banking77classification
- emotionclassification
- massiveintentclassification
- massivescenarioclassification
- mtopdomainclassification
- mtopintentclassification
- toxicconversationsclassification
- tweetsentimentextractionclassification
- arxivclusteringp2p
- arxivclusterings2s
- biorxivclusteringp2p
- biorxivclusterings2s
- medrxivclusteringp2p
- medrxivclusterings2s
- twentynewsgroupsclustering

I will open a PR to revise model metadata since the task TwentyNewsgroupsClustering was not listed as expected.

I've also checked the other tasks and there's no omission now.

Hypothesis-Z · 2025-06-10T03:04:30Z

New PR: embeddings-benchmark/mteb#2802

KennethEnevoldsen · 2025-06-10T20:30:29Z

Thanks for the update @Hypothesis-Z - will merge this in

zhangzeqing added 2 commits June 5, 2025 13:42

add geoembedding results

ceb2d45

rename geoembedding result file

0c7869d

Hypothesis-Z mentioned this pull request Jun 9, 2025

model: Add GeoGPT-Research-Project/GeoEmbedding embeddings-benchmark/mteb#2773

Merged

8 tasks

minor fix

5c55f9a

KennethEnevoldsen enabled auto-merge (squash) June 10, 2025 20:29

KennethEnevoldsen disabled auto-merge June 10, 2025 20:29

KennethEnevoldsen merged commit 95a8aeb into embeddings-benchmark:main Jun 10, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add geoembedding results #215

add geoembedding results #215

Uh oh!

Hypothesis-Z commented Jun 5, 2025

Uh oh!

KennethEnevoldsen commented Jun 9, 2025 •

edited

Loading

Uh oh!

Hypothesis-Z commented Jun 10, 2025

Uh oh!

Hypothesis-Z commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 10, 2025

Uh oh!

Uh oh!

add geoembedding results #215

add geoembedding results #215

Uh oh!

Conversation

Hypothesis-Z commented Jun 5, 2025

Checklist

Uh oh!

KennethEnevoldsen commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hypothesis-Z commented Jun 10, 2025

Uh oh!

Hypothesis-Z commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 10, 2025

Uh oh!

Uh oh!

KennethEnevoldsen commented Jun 9, 2025 •

edited

Loading

Hypothesis-Z commented Jun 10, 2025 •

edited

Loading