CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Authors: Anirudh Khatry, Robert Zhang, Jia Pan, Ziteng Wang, Qiaochu Chen, Greg Durrett, Isil Dillig.

Overview

C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether a system can transpile C into safe Rust that passes a set of test cases. We introduce CRUST-Bench, a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases that can be used to validate correctness of the transpilation. By considering entire repositories rather than isolated functions, CRUST-Bench captures the challenges of translating complex projects with dependencies across multiple files. The provided Rust interfaces provide explicit specifications that ensure adherence to idiomatic, memory-safe Rust patterns, while the accompanying test cases enforce functional correctness. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem for various state-of-the-art methods and techniques. We also provide insights into the errors LLMs usually make in transpiling code from C to safe Rust. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting. Improvements on CRUST-Bench would lead to improved transpilation systems that can reason about complex scenarios and help in migrating legacy codebases from C into languages like Rust that ensure memory safety.

Paper

Our paper "CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation" is available at:

arXiv

Dataset Description

Data Collection

We collected data from 100 Github repositories, spanning various domains like:

System utilities
Algorithms
Programming Language Infrastructure
Networking
Cryptography and Security
Data structures
etc.

Data Format

The dataset consists of 2 folders namely:

CBench
Rust Bench

CBench/
├── Project_1/
│   ├── file1.c/
│   ├── file2.c/
│   └── ...
├── Project_2/
│   ├── file1.c/
│   ├── file2.c/
│   └── ...
├── Project_3/
│   ├── file1.c/
│   ├── file2.c/
│   └── ...
└── ...

RBench/
├── Project_1/
│   ├── interfaces
│   │   ├── file1.rs
│   │   ├── file2.rs
│   │   └── ...
│   ├── bin
│   │   ├── test1.rs
│   │   ├── test2.rs
│   │   └── ...
├── Project_2/
│   ├── interfaces
│   │   ├── file1.rs
│   │   ├── file2.rs
│   │   └── ...
│   ├── bin
│   │   ├── test1.rs
│   │   ├── test2.rs
│   │   └── ...
│   └── ...
└── ...

Usage

Requirements

List any dependencies required to use the dataset:

pip install -r requirements.txt

Environment Setup

Rust is required to build the projects. You can install Rust using rustup.

Loading the Dataset

The dataset is within the datasets folder as a zip file that can be extracted to provide 2 folders:

CBench: the projects scraped from github.
RBench: the manually annotated interfaces and corresponding tests.

NOTE: You must ensure that both CBench and RBench folders are in the datasets/

To perform a sanity type check, we provide the check_benchmarks/check_build.py script that produces a compilable version of the rust project with the unimplemented!() function bodies that can be type checked. To run the script, you need to have rust installed on you system. You can run the script as follows:

python src/check_benchmarks/check_build.py

The src/dataset_stats aids in plotting metrics over the C repositories and the annotated Rust interfaces and tests.

To get the statistics of the dataset, you can run the following command:

python src/dataset_stats/get_c_stats.py datasets/CBench
python src/dataset_stats/get_interfaces_stats.py datasets/RBench

Recreating experiments from CRUST-bench.

Please set the relevant OpenAI, Antropic, Google CLOUD API keys using the environment variables.

export OPENAI_API_KEY=<OpenAI_API_KEY>
export ANTHROPIC_API_KEY=<ANTHROPIC_API_KEY>
export GOOGLE_CLOUD_PROJECT=<GOOGLE_CLOUD_PROJECT>
export GOOGLE_CLOUD_REGION=<GOOGLE_CLOUD_REGION>

We provide easy bash scripts located in the scripts folder that allow you to easily test your models with our pipeline.

The entrypoint to our code is the src/run.py file that takes in the following params:

Argument	Type	Required	Default	Description
`--benchmark_dir`	`str`	✅	–	Path to the C project (CBench) directory.
`--rust_dir`	`str`	❌	`None`	Path to the Rust project (RBench) directory.
`--output_dir`	`str`	✅	–	Path to the output directory.
`--prompt`	`str`	✅	–	Prompt to use for the model during transpilation.
`--mode`	`str`	❌	`"normal"`	Transpilation mode. Options: `normal`, `multi_gen`.
`--endpoint`	`str`	✅	–	Endpoint to use for the model. See `endpoints/call_endpoint.py` for details.
`--prompt_format`	`str`	✅	–	Format of the prompt. Options: `markdown`, `bullet_point`.
`--prompt_strategy`	`str`	❌	`"all"`	Strategy for composing the prompt. Options: `all` (all files are appended).
`--repairer_prompt`	`str`	✅	–	Prompt used for the repairer model.
`--repairer_format`	`str`	✅	–	Format of the repairer prompt. Options: `markdown`, `bullet_point`.
`--repairer_strategy`	`str`	✅	–	Strategy for repairer prompt. Options: `all` (all files are appended).
`--iterations`	`str`	✅	–	Number of iterations to run the repairer.
`--include_headers`	`bool`	❌	`True`	Whether to include header files in the prompt.
`--single_benchmark`	`str`	❌	`None`	Run a single benchmark only (provide its name).
`--config`	`str`	❌	`None`	Path to the endpoint configuration file.
`--n`	`int`	❌	`1`	Number of generations to request from the model during transpilation.

For instance to run with OpenAI's o1 model you can run the following command:

python run.py \
    --benchmark_dir "../datasets/CBench" \
    --output_dir "outputs/o1" \
    --prompt ./prompts/transpilation_prompts/bullet_point/bullet_point_interface.prompt \
    --prompt_format bullet_point_with_system_instructions \
    --prompt_strategy all \
    --repairer_prompt ./prompts/repair_prompts/bullet_point/bullet_point.prompt \
    --repairer_format bullet_point_with_system_instructions \
    --repairer_strategy all \
    --iterations 3 \
    --mode ./endpoints/configs/o1.json \
    --endpoint "o1" \
    --rust_dir "../datasets/RBench"

For running more scripts, please refer to the scripts folder.

In order to perform the test based repair:

First we need to run the default transpile-then-repair pipeline. You must rust the code above to get the initial Rust projects. The output of the above command is stored in the outputs/o1 directory.

Next, we need to run the test based repair. This can be done by running the following command:

python repair_tests.py \
    --input_path outputs/o1 \
    --output_path output/o1_test_guided_repair \
    --endpoint o1 \
    --iterations 3

This will select the projects that compile but fail the tests and run the repairer on them. The repaired code will be saved in the output/o1_test_guided_repair directory.

The final result will be saved in the output/o1_test_guided_repair directory.

Adding your own models

To add your own model, you must:

Define the model in the endpoints folder. You must ensure that your model implements a get_results function that takes in a message and returns the response from the model in the form of a json that contains the response key.
Define the model configuration in the configs folder. The configuration file should contain the model name, the endpoint to call, and any other relevant parameters.
Update the call_endpoint.py file to include the new model in the endpoints dictionary. The endpoints dictionary should map the model name to the corresponding endpoint function.

Citation

If you use this dataset in your research, please cite our paper:

@misc{
  khatry2025crustbenchcomprehensivebenchmarkctosaferust,
  title={CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation}, 
  author={Anirudh Khatry and Robert Zhang and Jia Pan and Ziteng Wang and Qiaochu Chen and Greg Durrett and Isil Dillig},
  year={2025},
  eprint={2504.15254},
  archivePrefix={arXiv},
  primaryClass={cs.SE},
  url={https://arxiv.org/abs/2504.15254}, 
}

License

This dataset is released under GNU GPL license. See LICENSE for details.

Contact

For questions, issues, or further information, please contact:

Name: Anirudh Khatry
Email: [akhatry@utexas.edu]
GitHub: @anirudhkhatry

Acknowledgments

This research was conducted within a group supported by the National Science Founda- tion under awards CCF-1762299, CCF-1918889, CNS-1908304, CCF-1901376, CNS-2120696, CCF-2210831, CCF-2319471, and and by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112590133. We also thank the All Hands AI team for a discussion on the OpenHands CodeAct agentic framework applied to the C-to-Rust transpilation task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Overview

Paper

Dataset Description

Data Collection

Data Format

Usage

Requirements

Environment Setup

Loading the Dataset

Recreating experiments from CRUST-bench.

Adding your own models

Citation

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
datasets		datasets
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

anirudhkhatry/CRUST-bench

Folders and files

Latest commit

History

Repository files navigation

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Overview

Paper

Dataset Description

Data Collection

Data Format

Usage

Requirements

Environment Setup

Loading the Dataset

Recreating experiments from CRUST-bench.

Adding your own models

Citation

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages