Collections
Discover the best community collections!
Collections including paper arxiv:2402.17764
-
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 89.8k • 4.28k -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 239 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 250 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 256
-
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 8 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 590
-
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 8 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 590 -
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Paper • 2406.04333 • Published • 36
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 590 -
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 57 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 27 -
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Paper • 2308.06744 • Published • 1