Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 46
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 120
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 44
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 40
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search Paper • 1907.05737 • Published Jul 12, 2019
Trained Rank Pruning for Efficient Deep Neural Networks Paper • 1812.02402 • Published Dec 6, 2018 • 1
TRP: Trained Rank Pruning for Efficient Deep Neural Networks Paper • 2004.14566 • Published Apr 30, 2020 • 1
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models Paper • 2402.14800 • Published Feb 22, 2024 • 3
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models Paper • 2405.16057 • Published May 25, 2024
One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments Paper • 2405.20202 • Published May 30, 2024