GitHub - SWE-Perf/SWE-Perf

🚀 What is SWE-Perf?

Optimizing code performance is paramount in software engineering, yet it remains a largely unexplored frontier for Large Language Models (LLMs). While models excel at fixing bugs, their ability to make code faster at a repository-scale is not well understood.

To address this, we introduce SWE-Perf, the first benchmark meticulously designed to evaluate LLMs on performance optimization tasks within genuine, complex repository contexts. Unlike benchmarks that focus on isolated code snippets, SWE-Perf challenges models to understand and modify entire codebases. The benchmark comprises 140 instances, each derived from a real performance-improving pull request on a popular GitHub repository. For each instance, a model is provided with the full source code, the target functions, and the human expert's solution for reference. The core task is to generate a code patch that reduces the test's execution time without introducing bugs.

📦 Environment Setup

We recommend using Conda to manage dependencies.

To create the same environment:

conda env create -f environment.yml

📊 Evaluation Overview

Evaluation measures model-generated patch predictions on the SWE-Perf dataset to assess their performance. The evaluation pipeline has two stages:

Run Evaluation – Executes each prediction, measures runtime improvements, and stores logs.
Check Evaluation – Aggregates logs into a CSV report containing performance metrics.

See the evaluation README for detailed usage.

🛠️ Generation Overview

Generation covers the process of using LLMs to generate code patches that improve performance on SWE-Perf tasks.

The Oracle generation pipeline includes the following steps:

Generate Prompts – Collect relevant source files and assemble them into prompts.
Run Inference – Feed prompts to an LLM to produce patches.
Convert Outputs – Transform model outputs into unified diff format.

See the generation README for details.

📂 Data Collection Overview

The SWE-Perf dataset is built through a five-phase pipeline to identify and verify real-world performance-optimizing pull requests:

Collect PRs – Crawl PRs from 12 popular GitHub repositories.
Measure Performance – Run all unit tests before/after each PR to obtain runtime data.
Identify Performance PRs – Filter PRs with significant improvements and relevant test coverage.
Verify Stability – Confirm that performance gains are consistent and reproducible.
Extract Optimization Targets – Define function-level targets for Oracle and Realistic evaluation settings.

Full details are in the data collection README.

📚 BibTeX

If you find our work useful, please consider citing our paper:

@article{he2025sweperf,
    title={SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?},
    author={He, Xinyi and Liu, Qian and Du, Mingzhe and Yan, Lin and Fan, Zhijie and Huang, Yiming and Yuan, Zejian and Ma, Zejun},
    journal={arXiv preprint arXiv:2507.12415},
    year={2025}
}

🤝 Contact & Acknowledgements

Please reach out to qian.liu@tiktok.com for questions or feedback on SWE-Perf. We welcome collaborations and suggestions for improving the benchmark. This work was conducted during Xinyi and Yiming's internship at TikTok.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data_collection		data_collection
evaluation		evaluation
generation		generation
misc		misc
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 What is SWE-Perf?

📦 Environment Setup

📊 Evaluation Overview

🛠️ Generation Overview

📂 Data Collection Overview

📚 BibTeX

🤝 Contact & Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SWE-Perf/SWE-Perf

Folders and files

Latest commit

History

Repository files navigation

🚀 What is SWE-Perf?

📦 Environment Setup

📊 Evaluation Overview

🛠️ Generation Overview

📂 Data Collection Overview

📚 BibTeX

🤝 Contact & Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages