File size: 2,240 Bytes
a1604d2
 
 
 
 
 
 
 
aaf034e
a1604d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aaf034e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
license: cc-by-nc-4.0
---

# ArcticSpeculator

Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!

We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:

| method                                 | ShareGPT      | HumanEval |
|--------------------------------------|----------------|--------------|
| VLLM V1 Baseline      | 84.1 | 84.1    |
| VLLM V1 Eagle | 102.2   | 112.0    |
| VLLM V1 Eagle3  | 77.7   | 85.3 |
| VLLM V0 MLP-Speculator (IBM) | 77.9   | 66.7        |
| ArcticSpeculator                          | **172.4**   | **203.7**    |

For more details about ArcticSpeculator and how to use it:

* ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
* 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)

We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) to run with [ArcticInference](https://github.com/snowflakedb/ArcticInference):

| model | ArcticSpeculator |
|---- | ---- |
| [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct) |
| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct) |
| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | [Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct) |
| [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct)|