Skip to content

Conversation

asparius
Copy link
Collaborator

@asparius asparius commented Apr 16, 2024

Checklist for adding MMTEB dataset

Reason for dataset addition:

It is a collected Turkish movie sentiment dataset and it is the first classification dataset for Turkish. Publication of the dataset.

  • I have tested that the dataset runs with the mteb package.
  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.
  • I have added points for my submission to the POINTS.md file.

Copy link
Contributor

@Sakshamrzt Sakshamrzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add points too for the dataset. 2+4(6) points for yourself and 1 point for me for the review.

Thanks a lot for the changes!

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small things around metadata. I have found mteb/abstasks/TaskMetadata.py to be a helpful reference.

asparius and others added 8 commits April 17, 2024 14:57
…ion.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
…ion.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
…ion.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
…ion.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
…ion.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. We are very close. Sorry that I missed a few things in the first round. Please add a tur/__init__.py file as well.

Looks good otherwise!

Oh, and once you have merged updates from the main branch, please add 2+4=6 points for yourself, and 1 point each for myself and @Sakshamrzt for the review.

asparius and others added 3 commits April 17, 2024 15:37
@isaac-chung isaac-chung mentioned this pull request Apr 17, 2024
9 tasks
@asparius
Copy link
Collaborator Author

asparius commented Apr 17, 2024

Updated everything according to reviews @isaac-chung

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close!
Could you please also resolve the merge conflicts by syncing your fork and merging in the updates from main?

@asparius
Copy link
Collaborator Author

So close! Could you please also resolve the merge conflicts by syncing your fork and merging in the updates from main?

Should be ok now

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asparius Thanks for your contribution!

@isaac-chung isaac-chung merged commit b01382b into embeddings-benchmark:main Apr 17, 2024
@asparius asparius deleted the trmoviedemirtas branch April 17, 2024 15:03
@KennethEnevoldsen
Copy link
Contributor

@isaac-chung, when merging datasets into main, please prefix it with "fix:" to ensure that the version is bumped. You can find more on this in the contributing guidelines

@isaac-chung
Copy link
Collaborator

Yep will do. Apologies!

@KennethEnevoldsen
Copy link
Contributor

No worries, with the number of PR atm. it is unlikely to cause any issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants