Skip to content

EvalPlus v0.1.1

Compare
Choose a tag to compare
@ganler ganler released this 07 May 05:21
· 351 commits to master since this release

In this version, efforts are mainly made to sanitize and standardize code in evalplus. Most importantly, evalplus strictly follows the dataset usage style of HumanEval. As a result, users can use evalplus in this way:

carbon (1)

For more details, the main changes are (tracked in #1):

  • Package build and pypi setup
  • (HumanEval Compatibility) Support sample files as .jsonl
  • (HumanEval Compatibility) get_human_eval_plus() returns a dict instead of list
  • (HumanEval Compatibility) Use HumanEval task ID splitter "/" over "_"
  • Optimize the evaluation parallelism scheme to the sample-level granularity (original: task level)
  • Optimize IPC via shared memory
  • Remove groundtruth solutions to avoid data leakage
  • Use docker the sandboxing mechanism
  • Support Codegen2 in generation
  • Split dependency into multiple categories

PyPI: https://pypi.org/project/evalplus/0.1.1/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.1/images/sha256-4993a0dc0ec13d6fe88eb39f94dd0a927e1f26864543c8c13e2e8c5d5c347af0