Skip to content

opendatalab/OpenHuEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

arXiv license

Comparison of related benchmarks

Benchmark Real User Query Self-awareness Evaluation Proverb Reasoning Generative Task & LLM-as-Judge Hungarian Lang Comprehensive Hu-specific
WildBench
SimpleQA, ChineseSimpleQA
MAPS
MARC, MMMLU et al.
BenchMAX
MILQA
HuLU
OpenHuEval (ours)

🛠️ Inference and Evaluation on Opencompass

Below are the steps for quickly downloading OpenHuEval and using OpenCompass for evaluation.

1. OpenCompass Environment Setup

Refer to the installation steps for OpenCompass. Please use this forked repo of OpenCompass for the evaluation of OpenHuEval.

2. Download OpenHuEval

git clone https://github.com/opendatalab/OpenHuEval.git ${path_to_OpenHuEval_repo}

cd ${path_to_opencompass}
mkdir data
ln -snf ${path_to_OpenHuEval_repo} ./data/OpenHuEval

3. Run Inference and Evaluation

# use HuSimpleQA task as an example.
cd ${path_to_opencompass}

# modify config file `examples/eval_OpenHuEval_HuSimpleQA.py`: uncomment or add models you want to evaluate
python run.py examples/eval_OpenHuEval_HuSimpleQA.py -r --dump-eval-details

The inference and evaluation results would be in ${path_to_opencompass}/outputs, like this:

outputs
└── eval_OpenHuEval_HuSimpleQA
    └── 20250312_150000
        ├── predictions # prediction
        │   ├── llama-3_1-8b-instruct-lmdeploy
        │   ├── ...
        │   └── qwen2.5-72b-instruct-lmdeploy
        ├── results # evaluation
        │   ├── llama-3_1-8b-instruct-lmdeploy_judged-by--GPT-4o
        │   ├── ...
        │   └── qwen2.5-72b-instruct-lmdeploy_judged-by--GPT-4o
        └── summary # evaluation summary
            ├── judged-by--GPT-4o-capability_en.csv
            └── judged-by--GPT-4o-capability_hu.csv

4. Generate Analysis Results

cd ${path_to_OpenHuEval_repo}

# generate Figure 5 and Figure 6 related statistic result in https://arxiv.org/abs/2503.21500
# Config the related parameters in the tools/HuSimpleQA/lrm_reasoning_process_analysis.py according to the tools/HuSimpleQA/README.md before running.
python tools/HuSimpleQA/lrm_reasoning_process_analysis.py

# generate Figure 9 related statistic result in https://arxiv.org/abs/2503.21500
# Config the related parameters in the tools/HuMatchingFIB/main_process.py according to the tools/HuMatchingFIB/README.md before running.
python tools/HuMatchingFIB/main_process.py

🖊️ Citation

@inproceedings{yang-etal-2025-openhueval,
    title = "{O}pen{H}u{E}val: Evaluating Large Language Model on {H}ungarian Specifics",
    author = "Yang, Haote  and Wei, Xingjian  and Wu, Jiang  and Ligeti-Nagy, No{\'e}mi  and Sun, Jiaxing  and Wang, Yinfan  and Yang, Gy{\H{o}}z{\H{o}} Zijian  and Gao, Junyuan  and Wang, Jingchao  and Jiang, Bowen  and Wang, Shasha  and Yu, Nanjun  and Zhang, Zihao  and Hong, Shixin  and Liu, Hongwei  and Li, Wei  and Zhang, Songyang  and Lin, Dahua  and Wu, Lijun  and Pr{\'o}sz{\'e}ky, G{\'a}bor  and He, Conghui",
    editor = "Che, Wanxiang  and Nabende, Joyce  and Shutova, Ekaterina  and Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.390/",
    doi = "10.18653/v1/2025.findings-acl.390",
    pages = "7464--7520",
    ISBN = "979-8-89176-256-5",
}

💳 License

This project is released under the Apache 2.0 license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages