aquif-3-moe
Collection
Mixture of Experts models in the aquif-3 series.
โข
4 items
โข
Updated
A high-performance mixture-of-experts language model optimized for efficiency, coding, science, and general use. With 17B total parameters and 2.8B active parameters, aquif-3-moe delivers competitive performance across multiple domains while maintaining computational efficiency.
Architecture: Mixture of Experts (MoE)
Total Parameters: 17 billion
Active Parameters: 2.8 billion
License: Apache 2.0
Library: transformers
Metric | aquif-3-moe (17B a2.8B) | Phi-4 (14B) | Qwen3 (14B) | Gemma 3 (27B) | GPT-4.1 nano (Propr.) | Mistral Small 3.2 (24B) |
---|---|---|---|---|---|---|
MMLU (General Knowledge) | 83.2 | 84.8 | 82.0 | 78.6 | 80.1 | 80.5 |
LiveCodeBench (Coding) | 28.6 | 25.2 | 29.0 | 26.9 | 32.6 | 27.5 |
MATH-500 (Math) | 91.4 | 80.8 | 89.8 | 88.3 | 84.8 | 88.3 |
GPQA Diamond (Science) | 56.7 | 56.1 | 54.8 | 42.8 | 50.3 | 50.5 |
Average | 65.0 | 61.7 | 63.9 | 59.2 | 62.0 | 61.7 |
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "aquif/aquif-3-moe-17b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
inputs = tokenizer("Explain quantum entanglement:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
The mixture-of-experts architecture enables efficient scaling by activating only a subset of parameters for each input, providing the benefits of a larger model while maintaining computational efficiency comparable to much smaller dense models.
Apache 2.0
8-bit
Base model
inclusionAI/Ling-lite-base