EvalPlus v0.1.5
🚀 HumanEval+[mini]
-- 47x smaller while equivalently effective as HumanEval+
- Add
--mini
toevalplus.evaluate ...
you can use a minimal and best-quality set of extra tests to accelerate evaluation! HumanEval+[mini]
(avg 16.5 tests) is smaller thanHumanEval+
(avg 774.8 tests) by 47x.- This is achieved via test-suite reduction -- we run a set covering algorithm to preserve the same coverage (coverage analysis), mutant-killings (mutation analysis) and sample-killings (pass-fail status of each sample-test pair).
PyPI: https://pypi.org/project/evalplus/0.1.5/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.5/images/sha256-01ef3275ab02776e94edd4a436a3cd33babfaaf7a81e7ae44f895c2794f4c104