EvalPlus v0.1.6
- Supporting configurable timeouts
$T=\max(T_{base}, T_{gt}\times k)$ , where:-
$T_{base}$ is the minimal timeout (configurable by--min-time-limit
; default to 0.2s); -
$T_{gt}$ is the runtime of the ground-truth solutions (achieved via profiling); -
$k$ is a configurable factor--gt-time-limit-factor
(default to 4);
-
- Using a more conservative timeout setting to mitigate test-beds with weak performance (
$T_{base}: 0.05s \to 0.2s$ and$k: 2\to 4$ ). -
HumanEval+
dataset bug fixes:
PyPI: https://pypi.org/project/evalplus/0.1.6/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.6/images/sha256-5913b95172962ad61e01a5d5cf63b60e1140dd547f5acc40370af892275e777c