Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
jzwong 's Collections
MLLM
LLM
LLM-RL
Agent-RL
Novel
SYS
Survey

SYS

updated 10 days ago
Upvote
-

  • Tensor Product Attention Is All You Need

    Paper • 2501.06425 • Published Jan 11 • 89

  • Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

    Paper • 2501.11873 • Published Jan 21 • 66

  • Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

    Paper • 2502.11089 • Published Feb 16 • 159

  • MoBA: Mixture of Block Attention for Long-Context LLMs

    Paper • 2502.13189 • Published Feb 18 • 17

  • BitNet b1.58 2B4T Technical Report

    Paper • 2504.12285 • Published Apr 16 • 73

  • BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

    Paper • 2504.18415 • Published Apr 25 • 42

  • Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

    Paper • 2505.09343 • Published 13 days ago • 61
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs