zaydzuhri's picture
Add files using upload-large-folder tool
e49db55 verified
|
raw
history blame
935 Bytes

The Llama 4 folder is still under development.

Available features

  • Llama 4 model definition (text-only), including the MoE architecture with token-choice routing using efficient bfloat16 Grouped MM kernels
  • FSDP, TP, PP, CP support
  • DCP checkpoint conversion scripts

Download Llama 4 tokenizer

# Llama 4 tokenizer.model
python scripts/download_tokenizer.py --repo_id meta-llama/Llama-4-Scout-17B-16E --tokenizer_path "" --hf_token=...

To be added

  • Modeling
    • iRoPE implementation
    • load balance loss for token-choice MoE
    • alternative expert-choice MoE
    • multimodal support
  • Parallelism
    • Context Parallel support for FlexAttention, iRoPE, and multimodal inputs
    • Expert Parallel support
  • torch.compile
    • for MoE layers
  • Quantization
    • efficient float8 GroupedGEMM kernels (from torchao)
  • Testing
    • perfomance and loss converging tests
    • CI integration