-
Notifications
You must be signed in to change notification settings - Fork 459
fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression #2716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression #2716
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations on the publication of your paper! Can you also add your benchmark to bencharks.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata generally looks good. Though maybe the descriptions could use a slight improvement
mteb/tasks/Classification/multilingual/RuSciBenchClassification.py
Outdated
Show resolved
Hide resolved
mteb/tasks/Classification/multilingual/RuSciBenchClassification.py
Outdated
Show resolved
Hide resolved
@AlexeyVatolin would love to get this in, if you're still working on this! |
It seems like this has gotten stale - it's close enough that we could finish it. @Samoed I suppose we could solve the load_data issue simply using v2 and then we are basically there |
As @Samoed mentioned, I have added RuSciBench to the list of benchmarks. There is an issue with the GRNTI and OECD classification tasks: they were previously added as part of the RuMTEB benchmark, but only in Russian. To avoid name conflicts, I added "Orig" to the names (RuSciBenchGRNTIOrigClassification, RuSciBenchOECDOrigClassification). I have checked and found that the data is sampled slightly differently, which is why the metric values for the tasks do not match in Russian. |
Thanks. I think in general this looks good. I'd like get @KennethEnevoldsen and @Samoed 's opinion on the added regression abstask before moving forward.
Let's add
|
Great work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focused mostly on the regression tasks - generally everything looks good, but had a few minor changes to add.
bea5678
to
3f9ba9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more changes, but otherwise I think we are good to merge
class RegressorModel(Protocol): | ||
def fit(self, X, y, sample_weight=None): ... | ||
def predict(self, X): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use the RegressorMixin, but that might a bit harder for the user so I would import it as:
from sklearn.base import RegressorMixin as SklearnRegressorModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RegressorMixin
class has only the score
method, which is not used in my code. If we use it, the LinearRegressionEvaluator
class will encounter the following error: Cannot access attribute "fit" for class "RegressorMixin"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that is annoying..., but I see the reason for using this approach then. Let's rename it to SklearnRegressorModel (just to clarify that it is a Sklearn compatible model)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the issue with the regressor typing is fixed then this is good to merge
@KennethEnevoldsen, Could you please take a look at the pull request when you have a moment? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Sorry for being slow to respond, I was at conference (ACL) last week
Add RuSciBench datasets with scientific tasks on Russian and English from Russian scientific electronic library elibrary.ru
Here is out paper:
https://link.springer.com/article/10.1134/S1064562424602191
Checklist
I did not add a dataset, or if I did, I added the dataset checklist to the PR and completed it.
I did not add a model, or if I did, I added the model checklist to the PR and completed it.
I have run the following models on the task (adding the results to the pr). These can be run using the
mteb run -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)