Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Julius-L 's Collections
inference acceleration
multimodal dataset
Generation
Long Context
Finetuning
Memory Efficient Training
Pretraining
Model Architecture
Model Merging
Sparsification
Quantization
LLM Technical Reports
Unseen Papers

inference acceleration

updated Jun 3
Upvote
-

  • SageAttention2++: A More Efficient Implementation of SageAttention2

    Paper • 2505.21136 • Published May 27 • 45

  • SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

    Paper • 2505.11594 • Published May 16 • 72
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs