aquif-moe-800m
aquif-moe-800m is our first Mixture of Experts (MoE) model, with only 800 million active parameters. Despite its compact size, it delivers exceptional performance-per-VRAM efficiency compared to larger models.
Model Overview
- Name:
aquif-moe-800m
- Parameters: 800 million active parameters (3.3 billion total)
- Context Window: 128,000 tokens
- Architecture: Mixture of Experts (MoE)
- Type: General-purpose LLM
- Hosted on: Ollama
Key Features
- Extremely efficient VRAM utilization (57.8 performance points per GB)
- Expansive 128K token context window for handling long documents
- Competitive performance against models with more parameters
- Optimized for local inference on consumer hardware
- Ideal for resource-constrained environments
- Supports high-throughput concurrent sessions
Performance Benchmarks
aquif-moe-800m demonstrates state-of-the-art performance across multiple benchmarks, especially when considering its parameter efficiency:
Benchmark | aquif-moe (0.8b) | Llama 3.2 (1b) | Gemma 3 (4b) |
---|---|---|---|
MMLU | 52.2 | 49.3 | 59.6 |
HumanEval | 37.5 | 22.6 | 36.0 |
GSM8K | 49.0 | 44.4 | 38.4 |
Average | 46.2 | 38.8 | 44.7 |
VRAM Efficiency
One of aquif-moe-800m's standout features is its exceptional VRAM efficiency:
Model | Average Performance | VRAM (GB) | Performance per VRAM |
---|---|---|---|
aquif-moe | 46.2 | 0.8 | 57.8 |
Llama 3.2 | 38.8 | 1.2 | 32.3 |
Gemma 3 | 44.7 | 4.3 | 10.4 |
Use Cases
- Edge computing and resource-constrained environments
- Mobile and embedded applications
- Local development environments
- Quick prototyping and testing
- Personal assistants on consumer hardware
- Enterprise deployment with multiple concurrent sessions
- Long document analysis and summarization
- High-throughput production environments
Limitations
- No thinking mode capability
- May show hallucinations in some areas
- May struggle with more complex reasoning tasks
- Not optimized for specialized domains
Getting Started
To run via Ollama:
ollama run aquiffoo/aquif-moe-800m
Technical Details
The aquif-moe-800m leverages a Mixture of Experts architecture to achieve high parameter efficiency. While the total parameter count is larger, only 800 million parameters are activated during inference, allowing for significantly reduced VRAM requirements while maintaining competitive performance.
Enterprise Deployment
The model's exceptional VRAM efficiency makes it particularly valuable for enterprise deployments:
- Concurrent Sessions: Run multiple model instances on a single GPU
- High Throughput: Serve more users with the same hardware footprint
- Cost Efficiency: Lower infrastructure costs for production deployments
- Scalability: Easier horizontal scaling across available resources
The 128K context window enables comprehensive document analysis while maintaining the model's efficient resource utilization, making it suitable for enterprises dealing with large documents or extended conversations.
*Note: All performance metrics are approximated estimates based on internal evaluations.
- Downloads last month
- 32
Model tree for aquiffoo/aquif-moe-800m
Base model
ibm-granite/granite-3.1-3b-a800m-base