OLMoE-1B-7B-Eagle3 Draft Model
This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.
π¦ Included Files
pytorch_model.bin
: Trained EAGLE Draft model weightsconfig.json
: Model configuration file (OLMoE architecture)tokenizer_config.json
: Tokenizer configuration filemodeling_olmoe_kv.py
: OLMoE-specific model code (required for EAGLE inference)eagle_data.json
: Training dataset (ShareGPT questions + OLMoE-generated answers).gitattributes
: Git LFS settings, etc.
π¦ What is the EAGLE Draft Model?
EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)
by training a draft decoder layer separately.
- Fully compatible with OLMoE-1B-7B-0125-Instruct architecture
- The EAGLE Draft layer is structurally similar to the main modelβs decoder
- During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model
π Training Data Description
- eagle_data.json
- Only questions (prompts) are extracted from the ShareGPT dataset
- For each question, the allenai/OLMoE-1B-7B-0125-Instruct model generates its own answer
- Thus, the modelβs self-generated answers are used as ground truth to train the draft layer
- This approach ensures the draft layer learns a distribution very close to the main modelβs decoder,
maximizing EAGLE inference performance
π οΈ Usage
1. Using Model Weights/Config Files
pytorch_model.bin
,config.json
, andtokenizer_config.json
can be used directly with HuggingFace Transformers or EAGLE code.
2. Integrating with EAGLE Inference Code
- Copy
modeling_olmoe_kv.py
into the official EAGLE repo atEAGLE/eagle/model/
. - In your EAGLE inference script, import as:
from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
3. Example Code
from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
torch_dtype='bfloat16',
low_cpu_mem_usage=True,
total_token=-1
)
your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)
output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)
β οΈ Notes
- eagle_data.json contains only OLMoE-generated answers for public ShareGPT questions.
- The EAGLE Draft layer should be designed as close as possible to the main modelβs decoder
for optimal inference efficiency. modeling_olmoe_kv.py
must be included in your EAGLE inference code for correct operation.
π References
For questions or feedback, please open an issue!
- Downloads last month
- 500
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for wantsleep/OLMoE_1B_7B_Eagle3
Base model
allenai/OLMoE-1B-7B-0125
Finetuned
allenai/OLMoE-1B-7B-0125-SFT
Finetuned
allenai/OLMoE-1B-7B-0125-DPO
Finetuned
allenai/OLMoE-1B-7B-0125-Instruct