--- library_name: transformers language: - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi base_model: - meta-llama/Llama-4-Scout-17B-16E-Instruct tags: - pytorch - llama - llama-4 - mixture of experts --- Llama 4 Scout but with topk=6 experts with dynamic expert fusion. Requires healing via SFT/RLHF to restore performance. Achieves 64.28% MMLU score. Barely fits on a RTX 4090 when quantized to 4bit. ## Model Information The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts. **Model developer**: Meta **Model Architecture:** The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality.
Model Name | Training Data | Params | Input modalities | Output modalities | Context length | Token count | Knowledge cutoff |
---|---|---|---|---|---|---|---|
Llama 4 Scout (17Bx16E) | A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI. Learn more in our Privacy Center. | 17B (Activated) 109B (Total) | Multilingual text and image | Multilingual text and code | 10M | ~40T | August 2024 |
Llama 4 Maverick (17Bx128E) | 17B (Activated) 400B (Total) | Multilingual text and image | Multilingual text and code | 1M | ~22T | August 2024 |