GitHub - NafisSadeq/reasoning-distillation

Summary

This repository contains the code for the paper Improving In-Context Learning with Reasoning Distillation. In this work, we improve the inductive reasoning ability of language models using teacher-to-student reasoning distillation. The proposed method achieves improved accuracy and cost-to-performance ratio compared to the teacher model GPT-4o.

News

Model checkpoints are now available in Hugging Face 🤗

All training datasets are now available in Hugging Face 🤗

Install

Clone repository

git clone https://github.com/NafisSadeq/reasoning-distillation.git
cd reasoning-distillation/

Create conda environment using the provided requirements.txt

conda create -n redis python=3.9
pip install -r requirements.txt

Inference (Hugging Face checkpoints)

You can directly use ReDis adapters from Hugging Face to perform inference.

python proposed.py --task acre --llm_name meta-llama/Meta-Llama-3-8B-Instruct --adapter_path nsadeq/ReDis-Llama --hypo_size 10 --rg_temp 0.9 --rf_temp 0.7
python proposed.py --task acre --llm_name Qwen/Qwen2.5-7B-Instruct --adapter_path nsadeq/ReDis-Qwen --hypo_size 10 --rg_temp 0.9 --rf_temp 0.7
python proposed.py --task acre --llm_name mistralai/Mistral-7B-Instruct-v0.3 --adapter_path nsadeq/ReDis-Mistral --hypo_size 10 --rg_temp 0.9 --rf_temp 0.7

Data augmentation (Only needed for custom training)

Perform data augmentation using a teacher model such GPT-4o

python generate_data_ir.py --llm_name gpt-4o --hypo_size 50 --task list_func
python generate_data_ir.py --llm_name gpt-4o --hypo_size 50 --task 1d_arc
python generate_data_ir.py --llm_name gpt-4o --hypo_size 50 --task acre
python generate_data_ir.py --llm_name gpt-4o --hypo_size 50 --task scan

Perform data filtering and merge the task-specific data segments

python construct_pair.py

This step will generate three datasets, generate_rule_sft.json, apply_rule_sft.json, and generate_rule_dpo.json. We already provide the three datasets within ./data/merged folder

Model training (Only needed for custom training)

We use LLaMA-Factory for model training. You can follow the steps below:

Install LLaMA-Factory following instructions here.
Copy the three dataset files within dataset folder
Update dataset info as required. Follow the instructions here.
Create model training configuration files within the training config directory. You can find examples of training configuration files within the config folder of current repository.

Then you can perform suprvised fine-tuning as follows.

llamafactory-cli train examples/train_lora/redis-llama-sft.yaml

After that you can perform alignment as follows.

llamafactory-cli train examples/train_lora/redis-llama-orpo.yaml

Inference

Load the base models with corresponding LoRA adapters created during model tuning for inference.

python proposed.py --task list_func --llm_name meta-llama/Meta-Llama-3-8B-Instruct --adapter_path <adapter_path> --hypo_size 10 --rg_temp 0.9 --rf_temp 0.7

Baselines

We also provide codes for corresponding baseline codes for both direct few-shot prompting and hypothesis search. For direct few-shot prompting, you can run

python baseline_io.py --task list_func --llm_name meta-llama/Meta-Llama-3-8B-Instruct

For hypothesis search with base LLaMA model, you can run

python baseline_ir.py --task list_func --llm_name meta-llama/Meta-Llama-3-8B-Instruct --hypo_size 10

Citation

If you use the code, dataset or model checkpoints, please cite the following work.

@misc{sadeq2025improvingincontextlearningreasoning,
      title={Improving In-Context Learning with Reasoning Distillation}, 
      author={Nafis Sadeq and Xin Xu and Zhouhang Xie and Julian McAuley and Byungkyu Kang and Prarit Lamba and Xiang Gao},
      year={2025},
      eprint={2504.10647},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.10647}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Summary

News

Install

Inference (Hugging Face checkpoints)

Data augmentation (Only needed for custom training)

Model training (Only needed for custom training)

Inference

Baselines

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline_io.py		baseline_io.py
baseline_ir.py		baseline_ir.py
construct_pair.py		construct_pair.py
generate_data_ir.py		generate_data_ir.py
llm.py		llm.py
proposed.py		proposed.py
requirements.txt		requirements.txt
utils.py		utils.py

License

NafisSadeq/reasoning-distillation

Folders and files

Latest commit

History

Repository files navigation

Summary

News

Install

Inference (Hugging Face checkpoints)

Data augmentation (Only needed for custom training)

Model training (Only needed for custom training)

Inference

Baselines

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages