Speculative Ensemble

Speculative Ensemble is a novel framework that accelerates the ensemble of multiple LLMs without sacrificing performance. It achieves a performance boost of 1.11x to 2.23x over standard ensemble methods in two- or three-model configurations.

Setup

# Create and activate the environment
conda create -n specens python=3.11 -y
conda activate specens

# Install vllm
cd vllm
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://files.pythonhosted.org/packages/c8/f4/e108a902ccad131d8978a9376343a6e95d78d0e12f152a796794647073ec/vllm-0.6.5-cp38-abi3-manylinux1_x86_64.whl
pip install --editable .

# Install the remaining dependencies
cd ..
pip install -r requirements.txt

How to Run

Create a .env file to specify the root path of your models.
```
MODEL_PATH=xxx
```

Run an example:

CUDA_VISIBLE_DEVICES=0 python ./main_dataset.py \
    dataset.name=humaneval \
    dataset.size=tiny \
    method=sd \
    method.model="llama-2-7b" \
    method.draft_model="llama-2-7b-68m" \
    method.gamma=5 \
    method.generate.temperature=0 \

Code Reading Guides

Chef is the internal name for the speculative ensemble implementation. The code is located in vllm/vllm/chef, while the baseline ensemble implementation can be found at vllm/vllm/ensemble_decode.

We have implemented multiple model inference methods. The configuration files are located in configs/method, and the desired method can be specified via method={method_name} (see Step 2 of How to run). Annotations are as follows:

Method	Description	Args Note
`large_model`	Inference using a single model in an autoregressive manner	-
`cd`	Contrastive decoding with two models	Requires (`method.amateur_model`) and (`method.alpha`)
`we`	Weighted ensemble with two models	Requires (`method.extra_model`) and (`method.lambda`)
`*_sd`	Accelerates the ensemble directly using speculative decoding	Inherits from specific ensemble methods, with `method.gamma` as an additional hyperparameter
`*_chef`	Accelerates the ensemble using speculative ensemble	Inherits from specific ensemble methods. For ensembles with more than two models, `method.gamma` should be a list of integers

Customization

Customizing an Ensemble Method

Create a YAML file, such as configs/method/your_ens_method.yaml, referencing configs/method/we.yaml.
Customize the method.extra_model parameter (as a string or a list of strings) and any additional parameters if needed.
Modify the llm.ensemble_fn and llm.ensemble_target to define the ensemble function.
Finally, use method=your_ens_method to run your custom ensemble method.

Customizing a Dataset

Create a directory src/mydatasets/your_dataset/ and a file src/mydatasets/your_dataset/mydataset.py that follows the template in src/mydatasets/template.py.
Use dataset=your_dataset to run your custom dataset.

Citation

@article{fu2025speculative,
  title={Speculative Ensemble: Fast Large Language Model Ensemble via Speculation},
  author={Fu, Jiale and Jiang, Yuchu and Chen, Junkai and Fan, Jiaming and Geng, Xin and Yang, Xu},
  journal={arXiv preprint arXiv:2502.01662},
  year={2025}
}

Acknowledgements

We built our implementation upon the VLLM project and would like to thank the authors for their outstanding contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
src		src
vllm		vllm
.gitignore		.gitignore
main_case.py		main_case.py
main_dataset.py		main_dataset.py
pipeline.py		pipeline.py
readme.md		readme.md
requirements.txt		requirements.txt
run_case.sh		run_case.sh
run_dataset.sh		run_dataset.sh
run_pipeline.sh		run_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speculative Ensemble

Setup

How to Run

Code Reading Guides

Customization

Customizing an Ensemble Method

Customizing a Dataset

Citation

Acknowledgements

About

Releases

Packages

Languages

Kamichanw/Speculative-Ensemble

Folders and files

Latest commit

History

Repository files navigation

Speculative Ensemble

Setup

How to Run

Code Reading Guides

Customization

Customizing an Ensemble Method

Customizing a Dataset

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages