Skip to content

EvalPlus v0.1.6

Compare
Choose a tag to compare
@ganler ganler released this 26 Jun 06:57
· 309 commits to master since this release
  • Supporting configurable timeouts $T=\max(T_{base}, T_{gt}\times k)$, where:
    • $T_{base}$ is the minimal timeout (configurable by --min-time-limit; default to 0.2s);
    • $T_{gt}$ is the runtime of the ground-truth solutions (achieved via profiling);
    • $k$ is a configurable factor --gt-time-limit-factor (default to 4);
  • Using a more conservative timeout setting to mitigate test-beds with weak performance ($T_{base}: 0.05s \to 0.2s$ and $k: 2\to 4$).
  • HumanEval+ dataset bug fixes:
    • Medium contract fixesL P129 (#4), P148 (self-identified)
    • Minor contract fixes: P75 (#4), P53 (#8), P0 (self-identified), P3 (self-identified), P9 (self-identified)
    • Minor GT fixes: P140 (#3)

PyPI: https://pypi.org/project/evalplus/0.1.6/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.6/images/sha256-5913b95172962ad61e01a5d5cf63b60e1140dd547f5acc40370af892275e777c