UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Paper • 2512.03383 • Published 26 days ago • 4
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26 • 110
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models Paper • 2511.18890 • Published Nov 24 • 32
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models Paper • 2511.23319 • Published about 1 month ago • 22
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published 28 days ago • 18
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning Paper • 2512.02551 • Published 27 days ago • 11
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 28 days ago • 93
LightRAG: Simple and Fast Retrieval-Augmented Generation Paper • 2410.05779 • Published Oct 8, 2024 • 26
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139