OLMoE-1B-7B-Eagle3 Draft Model

This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.


πŸ“¦ Included Files

  • pytorch_model.bin: Trained EAGLE Draft model weights
  • config.json: Model configuration file (OLMoE architecture)
  • tokenizer_config.json: Tokenizer configuration file
  • modeling_olmoe_kv.py: OLMoE-specific model code (required for EAGLE inference)
  • eagle_data.json: Training dataset (ShareGPT questions + OLMoE-generated answers)
  • .gitattributes: Git LFS settings, etc.

πŸ¦… What is the EAGLE Draft Model?

EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)
by training a draft decoder layer separately.

  • Fully compatible with OLMoE-1B-7B-0125-Instruct architecture
  • The EAGLE Draft layer is structurally similar to the main model’s decoder
  • During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model

πŸ“ Training Data Description

  • eagle_data.json
    • Only questions (prompts) are extracted from the ShareGPT dataset
    • For each question, the allenai/OLMoE-1B-7B-0125-Instruct model generates its own answer
    • Thus, the model’s self-generated answers are used as ground truth to train the draft layer
    • This approach ensures the draft layer learns a distribution very close to the main model’s decoder,
      maximizing EAGLE inference performance

πŸ› οΈ Usage

1. Using Model Weights/Config Files

  • pytorch_model.bin, config.json, and tokenizer_config.json
    can be used directly with HuggingFace Transformers or EAGLE code.

2. Integrating with EAGLE Inference Code

  • Copy modeling_olmoe_kv.py
    into the official EAGLE repo at EAGLE/eagle/model/.
  • In your EAGLE inference script, import as:
    from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
    

3. Example Code

from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
    base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
    ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
    torch_dtype='bfloat16',
    low_cpu_mem_usage=True,
    total_token=-1
)

your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)

output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)

⚠️ Notes

  • eagle_data.json contains only OLMoE-generated answers for public ShareGPT questions.
  • The EAGLE Draft layer should be designed as close as possible to the main model’s decoder
    for optimal inference efficiency.
  • modeling_olmoe_kv.py must be included in your EAGLE inference code for correct operation.

πŸ“š References


For questions or feedback, please open an issue!

Downloads last month
500
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for wantsleep/OLMoE_1B_7B_Eagle3

Dataset used to train wantsleep/OLMoE_1B_7B_Eagle3