Pinned Loading
-
open-compass/opencompass
open-compass/opencompass PublicOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
-
open-compass/MathBench
open-compass/MathBench Public[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
-
open-compass/GPassK
open-compass/GPassK Public[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
-
open-compass/CompassVerifier
open-compass/CompassVerifier Public[EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.