Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 6 days ago • 43
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published 7 days ago • 7
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Paper • 2506.18880 • Published 6 days ago • 1
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published 7 days ago • 31
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published 25 days ago • 62
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published about 1 month ago • 95
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published about 1 month ago • 132
AbsenceBench: Language Models Can't Tell What's Missing Paper • 2506.11440 • Published 17 days ago • 1
VerIF Collection RL trained models and datasets for instruction-following • 7 items • Updated 18 days ago • 2
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Paper • 2506.09942 • Published 18 days ago • 6
General-Reasoner Collection Advancing LLMs' general reasoning capabilities • 9 items • Updated 5 days ago • 4
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published 19 days ago • 26
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published 19 days ago • 52
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Paper • 2506.08009 • Published 20 days ago • 23
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction Paper • 2505.23416 • Published May 29 • 10