**The Llama 4 folder is still under development.** #### Available features - Llama 4 model definition (text-only), including the MoE architecture with token-choice routing using efficient bfloat16 Grouped MM kernels - FSDP, TP, PP, CP support - DCP checkpoint conversion scripts #### Download Llama 4 tokenizer ```bash # Llama 4 tokenizer.model python scripts/download_tokenizer.py --repo_id meta-llama/Llama-4-Scout-17B-16E --tokenizer_path "" --hf_token=... ``` #### To be added - Modeling - iRoPE implementation - load balance loss for token-choice MoE - alternative expert-choice MoE - multimodal support - Parallelism - Context Parallel support for FlexAttention, iRoPE, and multimodal inputs - Expert Parallel support - torch.compile - for MoE layers - Quantization - efficient float8 GroupedGEMM kernels (from torchao) - Testing - perfomance and loss converging tests - CI integration