EvalPlus v0.1.1
In this version, efforts are mainly made to sanitize and standardize code in evalplus
. Most importantly, evalplus
strictly follows the dataset usage style of HumanEval. As a result, users can use evalplus
in this way:
For more details, the main changes are (tracked in #1):
- Package build and pypi setup
- (HumanEval Compatibility) Support sample files as
.jsonl
- (HumanEval Compatibility)
get_human_eval_plus()
returns adict
instead oflist
- (HumanEval Compatibility) Use HumanEval task ID splitter
"/"
over"_"
- Optimize the evaluation parallelism scheme to the sample-level granularity (original: task level)
- Optimize IPC via shared memory
- Remove groundtruth solutions to avoid data leakage
- Use docker the sandboxing mechanism
- Support Codegen2 in generation
- Split dependency into multiple categories
PyPI: https://pypi.org/project/evalplus/0.1.1/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.1/images/sha256-4993a0dc0ec13d6fe88eb39f94dd0a927e1f26864543c8c13e2e8c5d5c347af0