The Llama 4 folder is still under development.
Available features
- Llama 4 model definition (text-only), including the MoE architecture with token-choice routing using efficient bfloat16 Grouped MM kernels
- FSDP, TP, PP, CP support
- DCP checkpoint conversion scripts
Download Llama 4 tokenizer
# Llama 4 tokenizer.model
python scripts/download_tokenizer.py --repo_id meta-llama/Llama-4-Scout-17B-16E --tokenizer_path "" --hf_token=...
To be added
- Modeling
- iRoPE implementation
- load balance loss for token-choice MoE
- alternative expert-choice MoE
- multimodal support
- Parallelism
- Context Parallel support for FlexAttention, iRoPE, and multimodal inputs
- Expert Parallel support
- torch.compile
- for MoE layers
- Quantization
- efficient float8 GroupedGEMM kernels (from torchao)
- Testing
- perfomance and loss converging tests
- CI integration