aquif-3-moe (17B)

A high-performance mixture-of-experts language model optimized for efficiency, coding, science, and general use. With 17B total parameters and 2.8B active parameters, aquif-3-moe delivers competitive performance across multiple domains while maintaining computational efficiency.

Model Details

Architecture: Mixture of Experts (MoE)
Total Parameters: 17 billion
Active Parameters: 2.8 billion
License: Apache 2.0
Library: transformers

Performance Benchmarks

Metric	aquif-3-moe (17B a2.8B)	Phi-4 (14B)	Qwen3 (14B)	Gemma 3 (27B)	GPT-4.1 nano (Propr.)	Mistral Small 3.2 (24B)
MMLU (General Knowledge)	83.2	84.8	82.0	78.6	80.1	80.5
LiveCodeBench (Coding)	28.6	25.2	29.0	26.9	32.6	27.5
MATH-500 (Math)	91.4	80.8	89.8	88.3	84.8	88.3
GPQA Diamond (Science)	56.7	56.1	54.8	42.8	50.3	50.5
Average	65.0	61.7	63.9	59.2	62.0	61.7

Key Strengths

Mathematical Reasoning: Achieves 91.4% on MATH-500, demonstrating exceptional mathematical problem-solving capabilities
Scientific Understanding: Leads in GPQA Diamond with 56.7%, showing strong scientific reasoning
Efficiency: Delivers competitive performance with only 2.8B active parameters
General Knowledge: Strong MMLU performance at 83.2%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "aquif/aquif-3-moe-17b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
inputs = tokenizer("Explain quantum entanglement:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Intended Use Cases

Mathematical problem solving and reasoning
Scientific research and analysis
Code generation and programming assistance
General question answering and text generation
Educational content creation

Model Architecture

The mixture-of-experts architecture enables efficient scaling by activating only a subset of parameters for each input, providing the benefits of a larger model while maintaining computational efficiency comparable to much smaller dense models.

License

Apache 2.0

aquiffoo
/

aquif-3-moe-17b-a2.8b-GGUF