Reviving Any-Subset Autoregressive Models via Principled Parallel Sampling and Speculative Decoding

In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution. With discrete diffusion models, the more tokens they generate in parallel, the less their predicted distributions adhere to the originally learned data distribution, as they rely on a conditional independence assumption that only works with infinitesimally small timesteps. We find that a different class of models, Any-Subset Autoregressive Models (AS-ARMs), holds the solution. As implied by the name, AS-ARMs can generate tokens in any order, and in parallel. Moreover, AS-ARMs support parallelized joint probability density estimation, allowing them to correct their own parallel-generated token distributions, via our Any-Subset Speculative Decoding (ASSD) algorithm. ASSD provably enables generation of tokens from the correct joint distribution, with the number of neural network calls upper bounded by the number of tokens predicted. We empirically verify that ASSD speeds up language generation, without sacrificing quality. Furthermore, we provide a mathematically justified scheme for training AS-ARMs for generation, and show that AS-ARMs achieve state-of-the-art performance among sub-200M parameter models on infilling benchmark tasks, and nearly match the performance of models 50X larger on code generation. Our theoretical and empirical results indicate that the once-forgotten AS-ARMs are a promising direction of language modeling.

Requirements

Python 3.10.16 pip install -r requirements.txt

You also need to install https://github.com/openai/human-eval-infilling

You need at least one GPU with 16GB, such as an A4000.

The commands also generally work with sbatch, if you're using slurm.

How to Run

You'll need to change the directories, batch sizes, etc.

Getting Models

Train Yourself

An A4000 (16GB) should support a batch size of 6, A5000 (24GB) should support a batch size of 8, A6000 (48GB) should support a batch size of 16. Gradient accumulation and distributed training are your friends.

In the .sh script, set the dataset to "bigcode/starcoderdata" or "openwebtext".

bash _run_training.sh

Download Them

Training takes a few days, even if you have a few A100s. We're nice, so you can freely download our models here: https://huggingface.co/therealgabeguo/ASARM. Our code also automatically loads the models from Huggingface by default.

Evaluations

Speculative Decoding

This will take samples from WikiText, blank out 95% of them, and complete them. It compares the Any-Subset Speculative Decoding (ASSD) to the sequential sampling algorithm. Given sufficiently large sample size, the distribution (as measured by perplexity and entropy) should be the same, while ASSD should provide ~10% decrease in NFEs and wall-clock time. All text outputs and metrics are saved.

bash _run_speculative_decoding.sh

Infilling Tasks

NLP

For the human language infills, first download ROCStories, and put it in eval_datasets folder. Then,

bash _run_nlp_infill_eval.sh

Code

For the code infills, run:

bash _run_code_infill_eval.sh

You will need to use the evaluation suite here to get metrics: https://github.com/openai/human-eval-infilling

Citations

If you find the code and/or ideas in this repository helpful, please cite

@article{guo2025_asarm,
  title={Reviving Any-Subset Autoregressive Models via Principled Parallel Sampling and Speculative Decoding},
  author={Guo, Gabe and Ermon, Stefano},
  journal={arXiv preprint},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
LICENSE		LICENSE
README.md		README.md
_continue_training_code.sh		_continue_training_code.sh
_continue_training_nlp.sh		_continue_training_nlp.sh
_run_ablations.sh		_run_ablations.sh
_run_code_infill_eval.sh		_run_code_infill_eval.sh
_run_left_to_right_eval.sh		_run_left_to_right_eval.sh
_run_nlp_infill_eval.sh		_run_nlp_infill_eval.sh
_run_r_atol_ablations.sh		_run_r_atol_ablations.sh
_run_speculative_decoding.sh		_run_speculative_decoding.sh
_run_training.sh		_run_training.sh
_train_nlp_infill.sh		_train_nlp_infill.sh
asarm.gif		asarm.gif
context_ngram.py		context_ngram.py
eval_coding.py		eval_coding.py
eval_infilling.py		eval_infilling.py
eval_perplexity.py		eval_perplexity.py
finetune_xlnet_distributed.py		finetune_xlnet_distributed.py
gcp_requirements.txt		gcp_requirements.txt
packed_dataset.py		packed_dataset.py
requirements.txt		requirements.txt
run_decoding_eval.py		run_decoding_eval.py
speculative_decoding.py		speculative_decoding.py
training_loop_sanity_checks.py		training_loop_sanity_checks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reviving Any-Subset Autoregressive Models via Principled Parallel Sampling and Speculative Decoding

Requirements

How to Run

Getting Models

Train Yourself

Download Them

Evaluations

Speculative Decoding

Infilling Tasks

NLP

Code

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

gabeguo/any-order-speculative-decoding

Folders and files

Latest commit

History

Repository files navigation

Reviving Any-Subset Autoregressive Models via Principled Parallel Sampling and Speculative Decoding

Requirements

How to Run

Getting Models

Train Yourself

Download Them

Evaluations

Speculative Decoding

Infilling Tasks

NLP

Code

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages