File size: 2,240 Bytes
a1604d2 aaf034e a1604d2 aaf034e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
license: cc-by-nc-4.0
---
# ArcticSpeculator
Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
| method | ShareGPT | HumanEval |
|--------------------------------------|----------------|--------------|
| VLLM V1 Baseline | 84.1 | 84.1 |
| VLLM V1 Eagle | 102.2 | 112.0 |
| VLLM V1 Eagle3 | 77.7 | 85.3 |
| VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 |
| ArcticSpeculator | **172.4** | **203.7** |
For more details about ArcticSpeculator and how to use it:
* ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
* 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)
We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) to run with [ArcticInference](https://github.com/snowflakedb/ArcticInference):
| model | ArcticSpeculator |
|---- | ---- |
| [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct) |
| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct) |
| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | [Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct) |
| [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct)| |