GitHub - smart-lty/LogitSpec: LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

This is the official implementation of paper "LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation".

Get Start

Run the following command to prepare the environment.

conda create -n logitspec python=3.9
conda activate logitspec
cd LogitSpec
pip install -r requirements.txt

Model Weights

Our LogitSpec is a retrieval-based speculative decoding method, which does not need additional draft model. Currently, our code only supports model family of Llama 2, including Llama 2, Vicuna, CodeLlama and so on. (All of these model weights can be found at Huggingface.)

Reproduction

To reproduce the reported results in our paper, run the command:

sh eval.sh

Acknowledgements

Our code is bulit on the official repo of Spec-Bench. Thanks for their excellent codebase!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
evaluation		evaluation
model/logitspec		model/logitspec
.gitignore		.gitignore
README.md		README.md
eval.sh		eval.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Get Start

Model Weights

Reproduction

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

smart-lty/LogitSpec

Folders and files

Latest commit

History

Repository files navigation

Get Start

Model Weights

Reproduction

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages