OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

Comparison of related benchmarks

Benchmark	Real User Query	Self-awareness Evaluation	Proverb Reasoning	Generative Task & LLM-as-Judge	Hungarian Lang	Comprehensive Hu-specific
WildBench	✔	✘	✘	✔	✘	✘
SimpleQA, ChineseSimpleQA	✘	✔	✘	✔	✘	✘
MAPS	✘	✘	✔	✘	✘	✘
MARC, MMMLU et al.	✘	✘	✘	✘	✔	✘
BenchMAX	✘	✘	✘	✔	✔	✘
MILQA	✘	✘	✘	✘	✔	✘
HuLU	✘	✘	✘	✘	✔	✘
OpenHuEval (ours)	✔	✔	✔	✔	✔	✔

🛠️ Inference and Evaluation on Opencompass

Below are the steps for quickly downloading OpenHuEval and using OpenCompass for evaluation.

1. OpenCompass Environment Setup

Refer to the installation steps for OpenCompass. Please use this forked repo of OpenCompass for the evaluation of OpenHuEval.

2. Download OpenHuEval

git clone https://github.com/opendatalab/OpenHuEval.git ${path_to_OpenHuEval_repo}

cd ${path_to_opencompass}
mkdir data
ln -snf ${path_to_OpenHuEval_repo} ./data/OpenHuEval

3. Run Inference and Evaluation

# use HuSimpleQA task as an example.
cd ${path_to_opencompass}

# modify config file `examples/eval_OpenHuEval_HuSimpleQA.py`: uncomment or add models you want to evaluate
python run.py examples/eval_OpenHuEval_HuSimpleQA.py -r --dump-eval-details

The inference and evaluation results would be in ${path_to_opencompass}/outputs, like this:

outputs
└── eval_OpenHuEval_HuSimpleQA
    └── 20250312_150000
        ├── predictions # prediction
        │   ├── llama-3_1-8b-instruct-lmdeploy
        │   ├── ...
        │   └── qwen2.5-72b-instruct-lmdeploy
        ├── results # evaluation
        │   ├── llama-3_1-8b-instruct-lmdeploy_judged-by--GPT-4o
        │   ├── ...
        │   └── qwen2.5-72b-instruct-lmdeploy_judged-by--GPT-4o
        └── summary # evaluation summary
            ├── judged-by--GPT-4o-capability_en.csv
            └── judged-by--GPT-4o-capability_hu.csv

4. Generate Analysis Results

cd ${path_to_OpenHuEval_repo}

# generate Figure 5 and Figure 6 related statistic result in https://arxiv.org/abs/2503.21500
# Config the related parameters in the tools/HuSimpleQA/lrm_reasoning_process_analysis.py according to the tools/HuSimpleQA/README.md before running.
python tools/HuSimpleQA/lrm_reasoning_process_analysis.py

# generate Figure 9 related statistic result in https://arxiv.org/abs/2503.21500
# Config the related parameters in the tools/HuMatchingFIB/main_process.py according to the tools/HuMatchingFIB/README.md before running.
python tools/HuMatchingFIB/main_process.py

🖊️ Citation

@inproceedings{yang-etal-2025-openhueval,
    title = "{O}pen{H}u{E}val: Evaluating Large Language Model on {H}ungarian Specifics",
    author = "Yang, Haote  and Wei, Xingjian  and Wu, Jiang  and Ligeti-Nagy, No{\'e}mi  and Sun, Jiaxing  and Wang, Yinfan  and Yang, Gy{\H{o}}z{\H{o}} Zijian  and Gao, Junyuan  and Wang, Jingchao  and Jiang, Bowen  and Wang, Shasha  and Yu, Nanjun  and Zhang, Zihao  and Hong, Shixin  and Liu, Hongwei  and Li, Wei  and Zhang, Songyang  and Lin, Dahua  and Wu, Lijun  and Pr{\'o}sz{\'e}ky, G{\'a}bor  and He, Conghui",
    editor = "Che, Wanxiang  and Nabende, Joyce  and Shutova, Ekaterina  and Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.390/",
    doi = "10.18653/v1/2025.findings-acl.390",
    pages = "7464--7520",
    ISBN = "979-8-89176-256-5",
}

💳 License

This project is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

Comparison of related benchmarks

🛠️ Inference and Evaluation on Opencompass

1. OpenCompass Environment Setup

2. Download OpenHuEval

3. Run Inference and Evaluation

4. Generate Analysis Results

🖊️ Citation

💳 License

About

Uh oh!

Releases

Packages

Languages

License

opendatalab/OpenHuEval

Folders and files

Latest commit

History

Repository files navigation

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

Comparison of related benchmarks

🛠️ Inference and Evaluation on Opencompass

1. OpenCompass Environment Setup

2. Download OpenHuEval

3. Run Inference and Evaluation

4. Generate Analysis Results

🖊️ Citation

💳 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages