lmsys
/

sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1

text-generation-inference

Model card Files Files and versions Community

sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1 / README.md

frankleeeee's picture

updated model card

364f0fa verified 14 days ago

|

1.16 kB

	---
	library_name: transformers
	license: mit
	---

	# sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1

	## Model Introduction
	The Eagle3 draft model was trained using the [SpecForge](https://github.com/sgl-project/SpecForge) framework for the Llama4 Maverick 17B-128E Instruct model, leveraging a combination of UltraChat and ShareGPT datasets. Under a 3-1-4 speculative decoding configuration—3 speculative steps, top-1 token selection, and 4 draft tokens—it achieves an acceptance length of 2.45.

	## Usage
	You can use this Eagle3 draft model in [SGLang](https://github.com/sgl-project/sglang) with the following command:

	```bash
	python3 -m sglang.launch_server \
	--model meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 \
	--speculative-algorithm EAGLE3 \
	--speculative-draft-model-path lmsys/sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1 \
	--speculative-num-steps 3 \
	--speculative-eagle-topk 1 \
	--speculative-num-draft-tokens 4 \
	--mem-fraction-static 0.75 \
	--cuda-graph-max-bs 2 \
	--tp 8 \
	--context-length 8192 \
	--trust-remote-code \
	--host 0.0.0.0 \
	--port 30000 \
	--dtype bfloat16
	```