Skip to content

EvalPlus v0.1.5

Compare
Choose a tag to compare
@ganler ganler released this 02 Jun 08:19
· 322 commits to master since this release

🚀 HumanEval+[mini] -- 47x smaller while equivalently effective as HumanEval+

  • Add --mini to evalplus.evaluate ... you can use a minimal and best-quality set of extra tests to accelerate evaluation!
  • HumanEval+[mini] (avg 16.5 tests) is smaller than HumanEval+ (avg 774.8 tests) by 47x.
  • This is achieved via test-suite reduction -- we run a set covering algorithm to preserve the same coverage (coverage analysis), mutant-killings (mutation analysis) and sample-killings (pass-fail status of each sample-test pair).

image

PyPI: https://pypi.org/project/evalplus/0.1.5/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.5/images/sha256-01ef3275ab02776e94edd4a436a3cd33babfaaf7a81e7ae44f895c2794f4c104