Skip to content

Add HumanEvalRetrieval task #3022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fzoll
Copy link
Contributor

@fzoll fzoll commented Aug 10, 2025

#3014

  • HumanEval dataset for code retrieval tasks

  • 158 queries, 158 documents

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.

    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).

  • I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)

@fzoll fzoll force-pushed the add-humaneval-retrieval branch from 0822ee1 to e6d4f65 Compare August 10, 2025 14:37
fzoll added 4 commits August 10, 2025 23:01
- Use TaskMetadata class instead of dict
- Remove descriptive_stats as requested in PR review
- Add date field and proper import structure
- Change path from zeroshot/humaneval-embedding-benchmark to embedding-benchmark/HumanEval
- Use actual description from HuggingFace dataset page
- Remove fabricated citation and reference
- Remove revision field that was incorrect
- Reference HuggingFace dataset page instead of arxiv
- Add revision hash: ed1f48a for reproducibility
- Add date field for metadata completeness
- Add bibtex_citation field (empty string)
- Required for TaskMetadata validation to pass
- Should resolve PR test failure
- Remove trust_remote_code parameter as requested
- Add revision parameter to load_dataset() calls for consistency
- Use metadata revision hash in dataset loading for reproducibility
@fzoll fzoll force-pushed the add-humaneval-retrieval branch from 0f4650d to cb6e750 Compare August 11, 2025 11:14
Changed query_id/corpus_id to query-id/corpus-id to match actual dataset format.
@fzoll fzoll force-pushed the add-humaneval-retrieval branch from cb6e750 to 3bceb9e Compare August 11, 2025 11:19
@fzoll fzoll requested a review from Samoed August 11, 2025 11:20
Use self.metadata.dataset instead of self.metadata_dict for v2.0 compatibility.
- Organize data by splits as expected by MTEB retrieval tasks
- Convert scores to integers for pytrec_eval compatibility
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask you to also compute the desc. stats as well using:

task = mteb.get_task(name)
task.calculate_metadata_metrics() # create file in the correct place

I also suggested a few updated to the metadata. Generally it should be possible for the user to read the description and have a fair understanding of how a sample might look like (what is the query, what does the corpus contain).

@fzoll fzoll force-pushed the add-humaneval-retrieval branch from 7c6c66e to ac13d9f Compare August 16, 2025 21:52
- Add descriptive statistics using calculate_metadata_metrics()
- Enhance metadata description with dataset structure details
- Add complete BibTeX citation for original paper
- Update to full commit hash revision
- Add python-Code language tag for programming language
- Explain retrieval task formulation clearly
@fzoll fzoll force-pushed the add-humaneval-retrieval branch from ac13d9f to 124de9e Compare August 16, 2025 21:56
- Update citation to match bibtexparser formatting requirements
- Fields now in alphabetical order with lowercase names
- Proper trailing commas and indentation
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for the PR

@KennethEnevoldsen KennethEnevoldsen merged commit d4e6223 into embeddings-benchmark:main Aug 17, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants