CoRAG: Collaborative Retrieval-Augmented Generation
Paper
•
2504.01883
•
Published
•
10
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
•
2504.08837
•
Published
•
43
Mavors: Multi-granularity Video Representation for Multimodal Large
Language Model
Paper
•
2504.10068
•
Published
•
30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
84
Efficient Generative Model Training via Embedded Representation Warmup
Paper
•
2504.10188
•
Published
•
12
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
•
2504.11343
•
Published
•
18
NormalCrafter: Learning Temporally Consistent Normals from Video
Diffusion Priors
Paper
•
2504.11427
•
Published
•
19
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
60
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
•
2504.09454
•
Published
•
12
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
•
2504.08672
•
Published
•
55
Generate, but Verify: Reducing Hallucination in Vision-Language Models
with Retrospective Resampling
Paper
•
2504.13169
•
Published
•
39
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
•
2504.12364
•
Published
•
21
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
•
2504.09643
•
Published
•
34
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper
•
2504.11393
•
Published
•
18
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Paper
•
2504.09566
•
Published
•
10
AlayaDB: The Data Foundation for Efficient and Effective Long-context
LLM Inference
Paper
•
2504.10326
•
Published
•
26
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
•
2504.12395
•
Published
•
17
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
•
2504.13055
•
Published
•
19
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper
•
2504.10449
•
Published
•
12
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper
•
2504.05303
•
Published
•
5
IAAO: Interactive Affordance Learning for Articulated Objects in 3D
Environments
Paper
•
2504.06827
•
Published
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls
for Video Generation
Paper
•
2504.14899
•
Published
•
21
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through
the Lens of Internal Representations
Paper
•
2504.13816
•
Published
•
17
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
Paper
•
2504.13203
•
Published
•
32
MIG: Automatic Data Selection for Instruction Tuning by Maximizing
Information Gain in Semantic Space
Paper
•
2504.13835
•
Published
•
38
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient
Training of Code LLMs
Paper
•
2504.14655
•
Published
•
19
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
•
2504.14870
•
Published
•
33
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
•
2504.15257
•
Published
•
46
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Paper
•
2504.15521
•
Published
•
64
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
•
2504.16078
•
Published
•
20
Vidi: Large Multimodal Models for Video Understanding and Editing
Paper
•
2504.15681
•
Published
•
15
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
•
2504.16080
•
Published
•
15
Pre-DPO: Improving Data Utilization in Direct Preference Optimization
Using a Guiding Reference Model
Paper
•
2504.15843
•
Published
•
18
I-Con: A Unifying Framework for Representation Learning
Paper
•
2504.16929
•
Published
•
30
Tina: Tiny Reasoning Models via LoRA
Paper
•
2504.15777
•
Published
•
55
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
•
2504.15279
•
Published
•
75
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
Paper
•
2504.17207
•
Published
•
29
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image
Generation
Paper
•
2504.17502
•
Published
•
56
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
•
2504.17789
•
Published
•
23
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Paper
•
2504.17768
•
Published
•
13
BitNet v2: Native 4-bit Activations with Hadamard Transformation for
1-bit LLMs
Paper
•
2504.18415
•
Published
•
44
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language Models
Paper
•
2504.15716
•
Published
•
10
Can Large Language Models Help Multimodal Language Analysis? MMLA: A
Comprehensive Benchmark
Paper
•
2504.16427
•
Published
•
17
RepText: Rendering Visual Text via Replicating
Paper
•
2504.19724
•
Published
•
30
YoChameleon: Personalized Vision and Language Generation
Paper
•
2504.20998
•
Published
•
11
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with
Diverse Modalities and Granularities
Paper
•
2504.20734
•
Published
•
62
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
•
2504.21776
•
Published
•
57
Sadeed: Advancing Arabic Diacritization Through Small Language Model
Paper
•
2504.21635
•
Published
•
59
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language
Models in Math
Paper
•
2504.21233
•
Published
•
47
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for
Scalable and Generalizable Robot Learning
Paper
•
2504.18904
•
Published
•
9
DeepCritic: Deliberate Critique with Large Language Models
Paper
•
2505.00662
•
Published
•
53
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level
and Token-level CoT
Paper
•
2505.00703
•
Published
•
43
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
•
2505.00234
•
Published
•
26
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D
Physics Modeling for Complex Motion and Interaction
Paper
•
2504.21855
•
Published
•
12
Improving Editability in Image Generation with Layer-wise Memory
Paper
•
2505.01079
•
Published
•
28
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG
Evaluation Prompts
Paper
•
2504.21117
•
Published
•
25
Ming-Lite-Uni: Advancements in Unified Architecture for Natural
Multimodal Interaction
Paper
•
2505.02471
•
Published
•
12
Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
•
2505.01441
•
Published
•
38
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
Paper
•
2505.02625
•
Published
•
22
A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency
Paper
•
2505.01658
•
Published
•
36
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
176
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM
Inference
Paper
•
2505.02922
•
Published
•
27
Benchmarking LLMs' Swarm intelligence
Paper
•
2505.04364
•
Published
•
19
StreamBridge: Turning Your Offline Video Large Language Model into a
Proactive Streaming Assistant
Paper
•
2505.05467
•
Published
•
13
R&B: Domain Regrouping and Data Mixture Balancing for Efficient
Foundation Model Training
Paper
•
2505.00358
•
Published
•
25
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents
Paper
•
2505.03570
•
Published
•
7
LLM-Independent Adaptive RAG: Let the Question Speak for Itself
Paper
•
2505.04253
•
Published
•
12
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in
Large Language Models
Paper
•
2505.02847
•
Published
•
27
X-Reasoner: Towards Generalizable Reasoning Across Modalities and
Domains
Paper
•
2505.03981
•
Published
•
14
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
•
2505.04842
•
Published
•
12
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision
Encoders for Multimodal Learning
Paper
•
2505.04601
•
Published
•
26
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene
Generation
Paper
•
2504.21650
•
Published
•
15
PrimeIntellect/INTELLECT-2
33B
•
Updated
•
1.85k
•
196
Unified Continuous Generative Models
Paper
•
2505.07447
•
Published
•
44
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable
Speaker Encoder
Paper
•
2505.07916
•
Published
•
125
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for
Image Analysis
Paper
•
2505.09358
•
Published
•
25
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal
Mathematical Reasoning
Paper
•
2505.10557
•
Published
•
46
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper
•
2505.10320
•
Published
•
22
OpenThinkIMG: Learning to Think with Images via Visual Tool
Reinforcement Learning
Paper
•
2505.08617
•
Published
•
42
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
•
2505.10554
•
Published
•
119
Parallel Scaling Law for Language Models
Paper
•
2505.10475
•
Published
•
81
Exploring the Deep Fusion of Large Language Models and Diffusion
Transformers for Text-to-Image Synthesis
Paper
•
2505.10046
•
Published
•
9
MMLongBench: Benchmarking Long-Context Vision-Language Models
Effectively and Thoroughly
Paper
•
2505.10610
•
Published
•
53
Simple Semi-supervised Knowledge Distillation from Vision-Language
Models via texttt{D}ual-texttt{H}ead
texttt{O}ptimization
Paper
•
2505.07675
•
Published
•
19
AdaptThink: Reasoning Models Can Learn When to Think
Paper
•
2505.13417
•
Published
•
79
Chain-of-Model Learning for Language Model
Paper
•
2505.11820
•
Published
•
119
Thinkless: LLM Learns When to Think
Paper
•
2505.13379
•
Published
•
50
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable
Step-Level Supervision
Paper
•
2505.13427
•
Published
•
25
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
•
2505.12504
•
Published
•
23
Improving Assembly Code Performance with Large Language Models via
Reinforcement Learning
Paper
•
2505.11480
•
Published
•
8
Visual Agentic Reinforcement Fine-Tuning
Paper
•
2505.14246
•
Published
•
31
Paper
•
2505.14674
•
Published
•
35
MMaDA: Multimodal Large Diffusion Language Models
Paper
•
2505.15809
•
Published
•
89
Emerging Properties in Unified Multimodal Pretraining
Paper
•
2505.14683
•
Published
•
130
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Paper
•
2505.14357
•
Published
•
26
Scaling Computer-Use Grounding via User Interface Decomposition and
Synthesis
Paper
•
2505.13227
•
Published
•
45
Think Only When You Need with Large Hybrid-Reasoning Models
Paper
•
2505.14631
•
Published
•
19
Diffusion vs. Autoregressive Language Models: A Text Embedding
Perspective
Paper
•
2505.15045
•
Published
•
54
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data
Could Be Secretly Stolen!
Paper
•
2505.15656
•
Published
•
14
RLVR-World: Training World Models with Reinforcement Learning
Paper
•
2505.13934
•
Published
•
14
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
•
2505.14810
•
Published
•
61
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel
Decoding
Paper
•
2505.16990
•
Published
•
21
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop
System from Hypothesis to Verification
Paper
•
2505.16938
•
Published
•
118
QuickVideo: Real-Time Long Video Understanding with System Algorithm
Co-Design
Paper
•
2505.16175
•
Published
•
40
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal
Large Language Models
Paper
•
2505.17015
•
Published
•
8
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
•
2505.15966
•
Published
•
51
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
•
2505.17612
•
Published
•
78
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
•
2505.18129
•
Published
•
59
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
•
2505.17667
•
Published
•
87
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in
Reasoning Models
Paper
•
2505.17225
•
Published
•
64
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Paper
•
2505.19147
•
Published
•
145
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
•
2505.19914
•
Published
•
42
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System
Collaboration
Paper
•
2505.20256
•
Published
•
17
Synthetic Data RL: Task Definition Is All You Need
Paper
•
2505.17063
•
Published
•
10
Interleaved Reasoning for Large Language Models via Reinforcement
Learning
Paper
•
2505.19640
•
Published
•
13
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper
•
2505.19297
•
Published
•
78
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
•
2505.14146
•
Published
•
17
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering
Workflow
Paper
•
2505.17399
•
Published
•
14
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent
Systems
Paper
•
2505.18943
•
Published
•
24
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering
Target Atoms
Paper
•
2505.20322
•
Published
•
14
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via
Next-Detail Prediction
Paper
•
2505.21473
•
Published
•
15
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in
Video Scenarios
Paper
•
2505.21333
•
Published
•
39
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM
Reasoning
Paper
•
2505.17813
•
Published
•
56
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient
Fine-Tuning
Paper
•
2505.20355
•
Published
•
36
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization
Paper
•
2505.19000
•
Published
•
42
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
•
2505.22453
•
Published
•
45
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual
Tool Selection
Paper
•
2505.20289
•
Published
•
10
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic
Scientific Workflows
Paper
•
2505.19897
•
Published
•
102
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale
Verified Dataset
Paper
•
2505.21297
•
Published
•
29
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
•
2505.23762
•
Published
•
46
UniRL: Self-Improving Unified Multimodal Models via Supervised and
Reinforcement Learning
Paper
•
2505.23380
•
Published
•
23
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial
Intelligence
Paper
•
2505.23747
•
Published
•
67
Muddit: Liberating Generation Beyond Text-to-Image with a Unified
Discrete Diffusion Model
Paper
•
2505.23606
•
Published
•
14
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV
Cache and Parallel Decoding
Paper
•
2505.22618
•
Published
•
42
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
•
2505.24864
•
Published
•
132
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Paper
•
2506.01049
•
Published
•
36
More Thinking, Less Seeing? Assessing Amplified Hallucination in
Multimodal Reasoning Models
Paper
•
2505.21523
•
Published
•
14
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper
•
2506.02096
•
Published
•
51
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Paper
•
2506.03136
•
Published
•
23
Visual Embodied Brain: Let Multimodal Large Language Models See, Think,
and Control in Spaces
Paper
•
2506.00123
•
Published
•
34
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon
Embodied Tasks
Paper
•
2506.00411
•
Published
•
30
DINGO: Constrained Inference for Diffusion LLMs
Paper
•
2505.23061
•
Published
•
30
Incentivizing Reasoning for Advanced Instruction-Following of Large
Language Models
Paper
•
2506.01413
•
Published
•
15
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for
Over-Reasoning Mitigation
Paper
•
2506.02397
•
Published
•
36
ComposeAnything: Composite Object Priors for Text-to-Image Generation
Paper
•
2505.24086
•
Published
•
4
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate
Video Diffusion Transformers
Paper
•
2506.03065
•
Published
•
27
From Token to Action: State Machine Reasoning to Mitigate Overthinking
in Information Retrieval
Paper
•
2505.23059
•
Published
•
13
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via
Diffusion Transformers
Paper
•
2505.21541
•
Published
•
7
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning
Capabilities of VLMs
Paper
•
2505.24120
•
Published
•
49
VideoREPA: Learning Physics for Video Generation through Relational
Alignment with Foundation Models
Paper
•
2505.23656
•
Published
•
24
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning
Capabilities Through Evaluation Design
Paper
•
2506.04734
•
Published
•
19
Image Editing As Programs with Diffusion Models
Paper
•
2506.04158
•
Published
•
24
Search Arena: Analyzing Search-Augmented LLMs
Paper
•
2506.05334
•
Published
•
17
Aligning Latent Spaces with Flow Priors
Paper
•
2506.05240
•
Published
•
25
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework
Paper
•
2506.02454
•
Published
•
5
FlexPainter: Flexible and Multi-View Consistent Texture Generation
Paper
•
2506.02620
•
Published
•
14
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal
Contextual Fusion
Paper
•
2506.01111
•
Published
•
29
Audio-Aware Large Language Models as Judges for Speaking Styles
Paper
•
2506.05984
•
Published
•
14
Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot
Data
Paper
•
2506.04120
•
Published
•
7
ConfQA: Answer Only If You Are Confident
Paper
•
2506.07309
•
Published
•
10
Through the Valley: Path to Effective Long CoT Training for Small
Language Models
Paper
•
2506.07712
•
Published
•
18
PartCrafter: Structured 3D Mesh Generation via Compositional Latent
Diffusion Transformers
Paper
•
2506.05573
•
Published
•
68
Vision Transformers Don't Need Trained Registers
Paper
•
2506.08010
•
Published
•
19
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video
Diffusion Models
Paper
•
2506.07177
•
Published
•
22
Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural
Compressor
Paper
•
2506.07932
•
Published
•
12
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper
•
2506.09790
•
Published
•
52
SAFE: Multitask Failure Detection for Vision-Language-Action Models
Paper
•
2506.09937
•
Published
•
9
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Paper
•
2506.09344
•
Published
•
26
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical
Reasoning
Paper
•
2506.09513
•
Published
•
93
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper
•
2506.10974
•
Published
•
18
Paper
•
2506.10910
•
Published
•
60
Comment on The Illusion of Thinking: Understanding the Strengths and
Limitations of Reasoning Models via the Lens of Problem Complexity
Paper
•
2506.09250
•
Published
•
27
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven
Clip Generation
Paper
•
2506.10540
•
Published
•
37
Aligned Novel View Image and Geometry Synthesis via Cross-modal
Attention Instillation
Paper
•
2506.11924
•
Published
•
32
Marrying Autoregressive Transformer and Diffusion with Multi-Reference
Autoregression
Paper
•
2506.09482
•
Published
•
46
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
Correct Reasoning in Base LLMs
Paper
•
2506.14245
•
Published
•
35
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
•
2506.06962
•
Published
•
28
Scaling Test-time Compute for LLM Agents
Paper
•
2506.12928
•
Published
•
58
Reasoning with Exploration: An Entropy Perspective
Paper
•
2506.14758
•
Published
•
26
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark
for Financial LLM Evaluation
Paper
•
2506.14028
•
Published
•
88
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in
LLMs
Paper
•
2506.19290
•
Published
•
48
Chain-of-Experts: Unlocking the Communication Power of
Mixture-of-Experts Models
Paper
•
2506.18945
•
Published
•
38
Learning to Skip the Middle Layers of Transformers
Paper
•
2506.21103
•
Published
•
10