Skip to content

SAI-Lab-NYU/DREAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DREAM Logo

DREAM

Contents

Setup & Installation

cd DREAM
pip install -r requirements.txt

# Download weight demo based on llava-v1.6-vicuna-7b
git clone https://huggingface.co/Alexhu1999/DREAM-llava-v1.6-vicuna-7b

Inference

The inference code we provide automatically allocates model weights (loading a model across multiple GPUs), allowing you to run models that exceed the memory of a single GPU.

With UI

We have provided a suggested web interface, which you can use by running the following command. After the model is fully loaded, a URL will be output in the terminal, which you can enter into your browser to access.

python -m dream.application.webui --ea-model-path /home/apc/models/DREAM-Vicuna-7B-v1.3 --base-model-path /home/apc/models/vicuna-7b-v1.3 --model-type vicuna --total-token 8

The total-token is the number of draft tokens. For smaller models and advanced GPUs, this value can be set larger. Adjusting according to the specific device and model can achieve better results. If set to -1, DREAM will automatically configure this parameter.

Train

Generate Train Data

You can run the following command to generate the training data.

python -m dream.ge_data.allocation --outdir [path of data]

Train the Auto-regression Head

cd dream/model
deepspeed main_deepspeed.py --deepspeed_config /home/apc/DREAM/dream/train/ds_config.json --tmpdir /home/apc/Bingle/data/llava_vicuna_mmt_0/12_data/sharegpt_0_7999_mufp16 --cpdir /home/apc/DREAM/dream/train/vicuna-7b-ckpt --configpath /home/apc/DREAM/dream/train/vicuna_7B_config.json

Evaluation

You can test the speed of DREAM on MT-bench using the following command.

python -m dream.evaluation.eval_llava\
		 --ea-model-path [path of DREAM weight]\ 
		 --base-model-path [path of the original model]\

The above two commands will each generate a .jsonl file that records the generation results and wall time.

Acknowledgements

This project has been influenced by many excellent projects in the LLM community, such as Medusa, EAGLE, FastChat, and others. We first release LLaVA version, others will merge together soon.

📄 Citation

If you find our work useful, please consider citing:

@misc{hu2025dreamdraftingrefinedtarget,
  title={DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding}, 
  author={Yunhai Hu and Tianhua Xia and Zining Liu and Rahul Raman and Xingyu Liu and Bo Bao and Eric Sather and Vithursan Thangarasa and Sai Qian Zhang},
  year={2025},
  eprint={2505.19201},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.19201}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages