SYS - a jzwong Collection

jzwong 's Collections

MLLM

LLM

LLM-RL

Novel

SYS

Survey

SYS

updated 10 days ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 159
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18 • 17
BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 73
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25 • 42
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published 13 days ago • 61