Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models Paper • 2309.15531 • Published Sep 27, 2023 • 1
Peri-LN: Revisiting Layer Normalization in the Transformer Architecture Paper • 2502.02732 • Published Feb 4 • 1
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models Paper • 2210.03858 • Published Oct 8, 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models Paper • 2206.09557 • Published Jun 20, 2022
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization Paper • 2306.00317 • Published Jun 1, 2023