SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 101
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 42
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Paper • 2412.15443 • Published Dec 19, 2024 • 10
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 27
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper • 2501.01821 • Published Jan 3 • 19
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper • 2501.02506 • Published Jan 5 • 11
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published Jan 6 • 14
Evaluating Sample Utility for Data Selection by Mimicking Model Weights Paper • 2501.06708 • Published Jan 12 • 5
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25 • 56
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following Paper • 2502.14494 • Published Feb 20 • 15
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26 • 22
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published Feb 26 • 28
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 29
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity Paper • 2503.01506 • Published Mar 3 • 9
General Reasoning Requires Learning to Reason from the Get-go Paper • 2502.19402 • Published Feb 26 • 5
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition Paper • 2503.00735 • Published Mar 2 • 21
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6 • 21
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles Paper • 2502.18968 • Published Feb 26 • 3
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention Paper • 2503.10602 • Published Mar 13 • 4
Temporal Consistency for LLM Reasoning Process Error Identification Paper • 2503.14495 • Published Mar 18 • 9
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees Paper • 2503.08893 • Published Mar 11 • 5
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base Paper • 2503.23361 • Published 24 days ago • 6
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization Paper • 2503.20286 • Published 28 days ago • 4
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published 22 days ago • 40
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published 9 days ago • 32
Efficient Process Reward Model Training via Active Learning Paper • 2504.10559 • Published 9 days ago • 13
AI-University: An LLM-based platform for instructional alignment to scientific classrooms Paper • 2504.08846 • Published 12 days ago • 8
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published 1 day ago • 24