zaydzuhri
/

softpick-340M-4096-batch16-steps100000

Model card Files Files and versions Community

softpick-340M-4096-batch16-steps100000 / torchtitan /experiments /llama4 /README.md

zaydzuhri's picture

Add files using upload-large-folder tool

e49db55 verified 3 months ago

|

935 Bytes

The Llama 4 folder is still under development.

Available features

Llama 4 model definition (text-only), including the MoE architecture with token-choice routing using efficient bfloat16 Grouped MM kernels
FSDP, TP, PP, CP support
DCP checkpoint conversion scripts

Download Llama 4 tokenizer

# Llama 4 tokenizer.model
python scripts/download_tokenizer.py --repo_id meta-llama/Llama-4-Scout-17B-16E --tokenizer_path "" --hf_token=...

To be added

Modeling
- iRoPE implementation
- load balance loss for token-choice MoE
- alternative expert-choice MoE
- multimodal support
Parallelism
- Context Parallel support for FlexAttention, iRoPE, and multimodal inputs
- Expert Parallel support
torch.compile
- for MoE layers
Quantization
- efficient float8 GroupedGEMM kernels (from torchao)
Testing
- perfomance and loss converging tests
- CI integration