EvalPlus v0.1.1

In this version, efforts are mainly made to sanitize and standardize code in evalplus. Most importantly, evalplus strictly follows the dataset usage style of HumanEval. As a result, users can use evalplus in this way:

For more details, the main changes are (tracked in #1):

Package build and pypi setup
(HumanEval Compatibility) Support sample files as .jsonl
(HumanEval Compatibility) get_human_eval_plus() returns a dict instead of list
(HumanEval Compatibility) Use HumanEval task ID splitter "/" over "_"
Optimize the evaluation parallelism scheme to the sample-level granularity (original: task level)
Optimize IPC via shared memory
Remove groundtruth solutions to avoid data leakage
Use docker the sandboxing mechanism
Support Codegen2 in generation
Split dependency into multiple categories

PyPI: https://pypi.org/project/evalplus/0.1.1/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.1/images/sha256-4993a0dc0ec13d6fe88eb39f94dd0a927e1f26864543c8c13e2e8c5d5c347af0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EvalPlus v0.1.1

Uh oh!