fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression #2716

AlexeyVatolin · 2025-05-22T22:39:42Z

Add RuSciBench datasets with scientific tasks on Russian and English from Russian scientific electronic library elibrary.ru

Here is out paper:
https://link.springer.com/article/10.1134/S1064562424602191

Checklist

I did not add a dataset, or if I did, I added the dataset checklist to the PR and completed it.
I did not add a model, or if I did, I added the model checklist to the PR and completed it.
I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)

model_name	task_name	languages	main_score
multilingual-e5-small	RuSciBenchBitexMining	eng-Latn,rus-Cyrl	0.978372
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchBitexMining	eng-Latn,rus-Cyrl	0.945861
multilingual-e5-small	RuSciBenchBitexMining	rus-Cyrl,eng-Latn	0.974774
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchBitexMining	rus-Cyrl,eng-Latn	0.929254
multilingual-e5-small	RuSciBenchCiteRetrieval	eng-Latn	0.25836
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchCiteRetrieval	eng-Latn	0.23692
multilingual-e5-small	RuSciBenchCiteRetrieval	rus-Cyrl	0.28923
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchCiteRetrieval	rus-Cyrl	0.18175
multilingual-e5-small	RuSciBenchCociteRetrieval	eng-Latn	0.21956
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchCociteRetrieval	eng-Latn	0.2035
multilingual-e5-small	RuSciBenchCociteRetrieval	rus-Cyrl	0.24766
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchCociteRetrieval	rus-Cyrl	0.15751
multilingual-e5-small	RuSciBenchCoreRiscClassification	eng-Latn	0.594057
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchCoreRiscClassification	eng-Latn	0.578581
multilingual-e5-small	RuSciBenchCoreRiscClassification	rus-Cyrl	0.594652
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchCoreRiscClassification	rus-Cyrl	0.580301
multilingual-e5-small	RuSciBenchPubTypeClassification	eng-Latn	0.345671
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchPubTypeClassification	eng-Latn	0.317749
multilingual-e5-small	RuSciBenchPubTypeClassification	rus-Cyrl	0.361472
paraphrase-multilingual-MiniLM-L12-v2	RuSciBenchPubTypeClassification	rus-Cyrl	0.321645

Samoed

Congratulations on the publication of your paper! Can you also add your benchmark to bencharks.py?

mteb/tasks/Retrieval/multilingual/RuSciBenchRetrieval.py

KennethEnevoldsen

Metadata generally looks good. Though maybe the descriptions could use a slight improvement

mteb/tasks/BitextMining/multilingual/RuSciBenchBitexMining.py

mteb/tasks/Classification/multilingual/RuSciBenchClassification.py

mteb/tasks/Retrieval/multilingual/RuSciBenchRetrieval.py

isaac-chung · 2025-06-28T10:36:42Z

@AlexeyVatolin would love to get this in, if you're still working on this!

KennethEnevoldsen · 2025-07-05T08:17:22Z

It seems like this has gotten stale - it's close enough that we could finish it. @Samoed I suppose we could solve the load_data issue simply using v2 and then we are basically there

AlexeyVatolin · 2025-07-13T18:32:26Z

As @Samoed mentioned, I have added RuSciBench to the list of benchmarks. There is an issue with the GRNTI and OECD classification tasks: they were previously added as part of the RuMTEB benchmark, but only in Russian. To avoid name conflicts, I added "Orig" to the names (RuSciBenchGRNTIOrigClassification, RuSciBenchOECDOrigClassification). I have checked and found that the data is sampled slightly differently, which is why the metric values for the tasks do not match in Russian.

isaac-chung · 2025-07-14T07:01:46Z

Thanks. I think in general this looks good. I'd like get @KennethEnevoldsen and @Samoed 's opinion on the added regression abstask before moving forward.

I added "Orig" to the names (RuSciBenchGRNTIOrigClassification, RuSciBenchOECDOrigClassification)

Let's add superseded_by to the non-orig version of the tasks as well? e.g.

add superseded_by="RuSciBenchOECDOrigClassification" to RuSciBenchOECDClassification

mteb/abstasks/AbsTaskRegression.py

Samoed · 2025-07-17T21:18:04Z

Great work!

KennethEnevoldsen

Focused mostly on the regression tasks - generally everything looks good, but had a few minor changes to add.

mteb/abstasks/AbsTaskRegression.py

mteb/evaluation/evaluators/RegressionEvaluator.py

…dd_ruscibench

KennethEnevoldsen

A few more changes, but otherwise I think we are good to merge

mteb/evaluation/evaluators/RegressionEvaluator.py

KennethEnevoldsen · 2025-07-25T08:45:23Z

mteb/evaluation/evaluators/RegressionEvaluator.py

+class RegressorModel(Protocol):
+    def fit(self, X, y, sample_weight=None): ...
+    def predict(self, X): ...


Better to use the RegressorMixin, but that might a bit harder for the user so I would import it as:

from sklearn.base import RegressorMixin as SklearnRegressorModel

The RegressorMixin class has only the score method, which is not used in my code. If we use it, the LinearRegressionEvaluator class will encounter the following error: Cannot access attribute "fit" for class "RegressorMixin".

Ahh, that is annoying..., but I see the reason for using this approach then. Let's rename it to SklearnRegressorModel (just to clarify that it is a Sklearn compatible model)

mteb/tasks/Classification/multilingual/RuSciBenchClassification.py

KennethEnevoldsen

Once the issue with the regressor typing is fixed then this is good to merge

AlexeyVatolin · 2025-08-01T10:02:34Z

@KennethEnevoldsen, Could you please take a look at the pull request when you have a moment? Thank you!

KennethEnevoldsen

Looks great! Sorry for being slow to respond, I was at conference (ACL) last week

Add RuSciBench

98bf9fd

AlexeyVatolin marked this pull request as draft May 22, 2025 22:41

fix bitext mining lang

a95f42d

AlexeyVatolin marked this pull request as ready for review May 22, 2025 22:51

Samoed reviewed May 22, 2025

View reviewed changes

mteb/tasks/Retrieval/multilingual/RuSciBenchRetrieval.py Outdated Show resolved Hide resolved

KennethEnevoldsen reviewed May 23, 2025

View reviewed changes

KennethEnevoldsen added the stale label Jul 5, 2025

AlexeyVatolin added 4 commits July 13, 2025 17:50

Add regression task

7ec15dd

fix init

611e7ff

Merge remote-tracking branch 'origin/main' into add_ruscibench

5f67e84

add missing files

67136ef

Improve description

42c68cc

AlexeyVatolin added 2 commits July 14, 2025 11:48

Add superseded_by

100f64f

fix lint

43a8ceb

Samoed reviewed Jul 14, 2025

View reviewed changes

mteb/abstasks/AbsTaskRegression.py Outdated Show resolved Hide resolved

Update regression task to match with v2

fff7575

Samoed reviewed Jul 14, 2025

View reviewed changes

mteb/abstasks/AbsTaskRegression.py Outdated Show resolved Hide resolved

Add stratified_subsampling for regression task

d2a11af

AlexeyVatolin requested a review from Samoed July 16, 2025 19:56

Samoed reviewed Jul 16, 2025

View reviewed changes

mteb/abstasks/AbsTaskRegression.py Outdated Show resolved Hide resolved

Add boostrap for regression task

f7a7907

AlexeyVatolin requested a review from Samoed July 17, 2025 18:07

Samoed approved these changes Jul 17, 2025

View reviewed changes

Samoed requested a review from KennethEnevoldsen July 17, 2025 21:17

Merge branch 'main' into add_ruscibench

c81ddf1

KennethEnevoldsen reviewed Jul 22, 2025

View reviewed changes

mteb/abstasks/AbsTaskRegression.py Outdated Show resolved Hide resolved

mteb/evaluation/evaluators/RegressionEvaluator.py Outdated Show resolved Hide resolved

AlexeyVatolin added 4 commits July 22, 2025 15:25

Rename task class, add model as evaluator argument

f4736ba

Merge branch 'add_ruscibench' of github.com:AlexeyVatolin/mteb into a…

9343e0a

…dd_ruscibench

fix import

d8aa9e9

fix import 2

3f9ba9b

AlexeyVatolin force-pushed the add_ruscibench branch from bea5678 to 3f9ba9b Compare July 22, 2025 21:00

AlexeyVatolin requested a review from KennethEnevoldsen July 23, 2025 09:39

KennethEnevoldsen reviewed Jul 25, 2025

View reviewed changes

KennethEnevoldsen removed the stale label Jul 25, 2025

fixes

38a0048

KennethEnevoldsen approved these changes Jul 25, 2025

View reviewed changes

AlexeyVatolin added 2 commits July 25, 2025 21:50

fix

41bfb21

Rename regression model protocol

effab73

AlexeyVatolin requested review from Samoed and KennethEnevoldsen July 25, 2025 22:30

Samoed approved these changes Jul 26, 2025

View reviewed changes

KennethEnevoldsen approved these changes Aug 2, 2025

View reviewed changes

KennethEnevoldsen changed the title ~~Add RuSciBench~~ fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression Aug 2, 2025

KennethEnevoldsen merged commit 36df9ca into embeddings-benchmark:main Aug 2, 2025
9 checks passed

fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression #2716

fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression #2716

Uh oh!

Conversation

AlexeyVatolin commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Samoed left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isaac-chung commented Jun 28, 2025

Uh oh!

KennethEnevoldsen commented Jul 5, 2025

Uh oh!

AlexeyVatolin commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented Jul 17, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

AlexeyVatolin Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

AlexeyVatolin commented Aug 1, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexeyVatolin commented May 22, 2025 •

edited

Loading

Samoed left a comment •

edited

Loading

AlexeyVatolin commented Jul 13, 2025 •

edited

Loading