Optimizing Large Language Models through Quantization: A Comparative Analysis of PTQ and QAT Techniques Paper • 2411.06084 • Published Nov 9, 2024 • 1
DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models Paper • 2504.09223 • Published Apr 12 • 1
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Paper • 2504.11651 • Published Apr 15 • 28 • 5
Agent models: Internalizing Chain-of-Action Generation into Reasoning models Paper • 2503.06580 • Published Mar 9 • 19 • 3