SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information Paper • 2505.13237 • Published 5 days ago • 2
TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering Paper • 2504.20114 • Published 27 days ago • 5
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17 • 39
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Paper • 2504.13816 • Published Apr 18 • 17
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models Paper • 2504.15133 • Published Apr 21 • 21
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs Paper • 2504.14655 • Published Apr 20 • 19
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8 • 110
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Paper • 2504.07086 • Published Apr 9 • 21
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published Apr 9 • 39
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 73
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 85
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Paper • 2504.08192 • Published Apr 11 • 4
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published Apr 7 • 11
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published Apr 11 • 55
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published Apr 14 • 40
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 46
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Paper • 2504.09081 • Published Apr 12 • 17