Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 18 days ago • 104
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 18 days ago • 104 • 6
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published Feb 11 • 18
HALO: Hadamard-Assisted Lossless Optimization for Efficient Low-Precision LLM Training and Fine-Tuning Paper • 2501.02625 • Published Jan 5 • 16
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 44
HALO: Hadamard-Assisted Lossless Optimization for Efficient Low-Precision LLM Training and Fine-Tuning Paper • 2501.02625 • Published Jan 5 • 16
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 44
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 44 • 3
Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14
Towards End-to-end 4-Bit Inference on Generative Large Language Models Paper • 2310.09259 • Published Oct 13, 2023 • 1
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Paper • 2306.03078 • Published Jun 5, 2023 • 3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation Paper • 2401.04679 • Published Jan 9, 2024 • 2
Extreme Compression of Large Language Models via Additive Quantization Paper • 2401.06118 • Published Jan 11, 2024 • 13
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization Paper • 2308.02060 • Published Aug 3, 2023 • 1
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization Paper • 2404.03605 • Published Apr 4, 2024 • 1
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning Paper • 2208.11580 • Published Aug 24, 2022