|
--- |
|
library_name: transformers |
|
license: mit |
|
--- |
|
|
|
# sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1 |
|
|
|
## Model Introduction |
|
The Eagle3 draft model was trained using the [SpecForge](https://github.com/sgl-project/SpecForge) framework for the Llama4 Maverick 17B-128E Instruct model, leveraging a combination of UltraChat and ShareGPT datasets. Under a 3-1-4 speculative decoding configuration—3 speculative steps, top-1 token selection, and 4 draft tokens—it achieves an acceptance length of 2.45. |
|
|
|
## Usage |
|
You can use this Eagle3 draft model in [SGLang](https://github.com/sgl-project/sglang) with the following command: |
|
|
|
```bash |
|
python3 -m sglang.launch_server \ |
|
--model meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 \ |
|
--speculative-algorithm EAGLE3 \ |
|
--speculative-draft-model-path lmsys/sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1 \ |
|
--speculative-num-steps 3 \ |
|
--speculative-eagle-topk 1 \ |
|
--speculative-num-draft-tokens 4 \ |
|
--mem-fraction-static 0.75 \ |
|
--cuda-graph-max-bs 2 \ |
|
--tp 8 \ |
|
--context-length 8192 \ |
|
--trust-remote-code \ |
|
--host 0.0.0.0 \ |
|
--port 30000 \ |
|
--dtype bfloat16 |
|
``` |