MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Paper • 2405.15593 • Published May 24, 2024 • 1
Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning Paper • 2407.10994 • Published Jun 24, 2024 • 2
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 52
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search Paper • 2410.14649 • Published Oct 18, 2024 • 9
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization Paper • 2308.02060 • Published Aug 3, 2023 • 1
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6, 2024 • 7
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Paper • 2203.07259 • Published Mar 14, 2022 • 4
ZipLM: Hardware-Aware Structured Pruning of Language Models Paper • 2302.04089 • Published Feb 7, 2023 • 1
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks Paper • 2302.04852 • Published Feb 9, 2023
GMP*: Well-Tuned Gradual Magnitude Pruning Can Outperform Most BERT-Pruning Methods Paper • 2210.06384 • Published Oct 12, 2022 • 1
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information Paper • 2107.03356 • Published Jul 7, 2021
Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14