dataset: add BarExamQA dataset #2916
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
I’m submitting this pull request to push BarExamQA to MTEB.
BarExamQA is a dataset created by RegLab for the purposes of evaluating models on the retrieval of relevant legal provisions.
BarExamQA contains over 100 questions taken from the bar exams of states around the US, with law students having manually identified the most relevant legal provisions to each question.
We would like to improve the coverage of legal domain tasks on MTEB and we believe this dataset will contribute to increasing the diversity and difficulty of MTEB.
This pull request is being submitted courtesy of Isaacus, a legal AI research company.
You may find the original dataset here:
https://huggingface.co/datasets/reglab/barexam_qa
Our version of the dataset after having been converted to the MTEB information retrieval dataset format is available here:
https://huggingface.co/datasets/isaacus/mteb-barexam-qa
Checklist