yuhuili/EAGLE-Qwen2-72B-Instruct

In the Llama 70B Eagle, you downsized the max positional embedding, which somewhat makes sense from your training code.
However, in this particular case, you disabled GQA. Any reason for that? I just checked for the case of Llama 70B, but you didn't change the config for the num_key_value_heads.

Lastly, it's a minor issue, but max_window_layers has increased to 80 from 70, but is there any reason for that?

Parameter	Qwen2.5-70B-Instruct	EAGLE-Qwen2.5-70B-Instruct
architectures	Qwen2ForCausalLM	Qwen2ForCausalLM
attention_dropout	0.0	0.0
bos_token_id	151643	151643
eos_token_id	151645	151645
hidden_act	silu	silu
hidden_size	8192	8192
initializer_range	0.02	0.02
intermediate_size	29568	29568
max_position_embeddings	32768	32768
max_window_layers	70	80
model_type	qwen2	qwen2
num_attention_heads	64	64
num_hidden_layers	80	1
num_key_value_heads	8	64
rms_norm_eps	1e-06	1e-06
rope_theta	1000000.0	1000000.0
sliding_window	131072	131072
tie_word_embeddings	false	false
torch_dtype	bfloat16	bfloat16
transformers_version	4.43.1	4.40.1
use_cache	true	true
use_sliding_window	false	false
vocab_size	152064	152064
qkv_bias	Not specified	true

yuhuili
/

EAGLE-Qwen2-72B-Instruct

GQA configuration