FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published May 2, 2024 • 29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23, 2024 • 41
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 54
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published May 29, 2024 • 12
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6, 2024 • 64
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 71
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17, 2024 • 15
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26, 2024 • 16
DiffusionPDE: Generative PDE-Solving Under Partial Observation Paper • 2406.17763 • Published Jun 25, 2024 • 25
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data Paper • 2406.18790 • Published Jun 26, 2024 • 35
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps Paper • 2407.07071 • Published Jul 9, 2024 • 12
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Paper • 2408.11878 • Published Aug 20, 2024 • 60
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Paper • 2408.15915 • Published Aug 28, 2024 • 20
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6, 2024 • 48
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 139
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19, 2024 • 23
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24, 2024 • 14
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 57
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 52
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 51
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published Dec 11, 2024 • 39
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published Dec 11, 2024 • 54
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 45
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 146
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published Dec 16, 2024 • 37
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published Dec 15, 2024 • 29
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published Dec 19, 2024 • 89
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 58
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published Dec 26, 2024 • 18
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models Paper • 2501.00874 • Published Jan 1 • 13
Personalized Graph-Based Retrieval for Large Language Models Paper • 2501.02157 • Published Jan 4 • 32
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 99
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published Jan 7 • 23
LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published Jan 8 • 37
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 52
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published Jan 11 • 11
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16 • 37
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Paper • 2501.08617 • Published Jan 15 • 10
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 41
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published Jan 15 • 20
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution Paper • 2501.10045 • Published Jan 17 • 9
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published Jan 21 • 46
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 91
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 59
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 38
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published Jan 30 • 20
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 20
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 39
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 152
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning Paper • 2502.04689 • Published Feb 7 • 7
Generating Symbolic World Models via Test-time Scaling of Large Language Models Paper • 2502.04728 • Published Feb 7 • 19
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents Paper • 2502.05957 • Published Feb 9 • 16
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 140
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published Feb 11 • 29
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published Feb 11 • 41
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon Paper • 2502.07445 • Published Feb 11 • 11
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling Paper • 2502.07737 • Published Feb 11 • 9
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging Paper • 2502.05664 • Published Feb 8 • 23
Retrieval-augmented Large Language Models for Financial Time Series Forecasting Paper • 2502.05878 • Published Feb 9 • 41
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training Paper • 2502.06589 • Published Feb 10 • 18
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published Feb 9 • 38
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published Feb 13 • 36
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems Paper • 2502.11098 • Published Feb 16 • 13
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening Paper • 2502.12146 • Published Feb 17 • 16
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 36
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Paper • 2502.11775 • Published Feb 17 • 8
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published Feb 17 • 19
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading Paper • 2502.11433 • Published Feb 17 • 36
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity Paper • 2502.11901 • Published Feb 17 • 6
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published Feb 19 • 28
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation Paper • 2502.12638 • Published Feb 18 • 8
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published Feb 18 • 42
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models Paper • 2502.13533 • Published Feb 19 • 11
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Paper • 2502.13962 • Published Feb 19 • 29
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Paper • 2502.13233 • Published Feb 18 • 15
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper • 2502.12853 • Published Feb 18 • 29
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper • 2502.14377 • Published Feb 20 • 12
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Paper • 2502.15027 • Published Feb 20 • 7
SurveyX: Academic Survey Automation via Large Language Models Paper • 2502.14776 • Published Feb 20 • 100
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published Feb 24 • 29
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published Feb 24 • 53
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25 • 26
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 87
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published Mar 7 • 46
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published Mar 7 • 37
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7 • 27
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM Paper • 2503.04504 • Published Mar 6 • 3
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models Paper • 2503.05638 • Published Mar 7 • 19
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Paper • 2503.02199 • Published Mar 4 • 8
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 50
Autoregressive Image Generation with Randomized Parallel Decoding Paper • 2503.10568 • Published Mar 13 • 8
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models Paper • 2503.09669 • Published Mar 12 • 36
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Paper • 2503.10437 • Published Mar 13 • 32
Learning from Failures in Multi-Attempt Reinforcement Learning Paper • 2503.04808 • Published Mar 4 • 18
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 29
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16 • 66
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 125
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning Paper • 2503.15265 • Published Mar 19 • 47
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published Mar 20 • 27
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 73
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 48
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Paper • 2503.16057 • Published Mar 20 • 14
ELTEX: A Framework for Domain-Driven Synthetic Data Generation Paper • 2503.15055 • Published Mar 19 • 6
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published Mar 20 • 24
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21 • 54
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published Mar 21 • 36
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24 • 118
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published Mar 26 • 46
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Paper • 2503.19470 • Published Mar 25 • 17
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27 • 62
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Paper • 2503.21749 • Published Mar 27 • 26
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation Paper • 2503.22194 • Published Mar 28 • 24
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31 • 53
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published Mar 31 • 62
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published Mar 31 • 38
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published Apr 1 • 36
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published Apr 1 • 41
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Paper • 2504.02949 • Published Apr 3 • 20
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published Apr 4 • 16
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Paper • 2504.05594 • Published Apr 8 • 12
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published about 1 month ago • 11
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper • 2504.06148 • Published Apr 8 • 13
A Unified Agentic Framework for Evaluating Conditional Image Generation Paper • 2504.07046 • Published about 1 month ago • 30
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance Paper • 2504.06232 • Published Apr 8 • 14
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 83
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 25 days ago • 255
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability Paper • 2504.08003 • Published about 1 month ago • 49
How new data permeates LLM knowledge and how to dilute it Paper • 2504.09522 • Published 26 days ago • 7
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published 28 days ago • 28
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models Paper • 2504.05303 • Published Apr 7 • 5
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published 28 days ago • 10
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published 28 days ago • 54
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published 25 days ago • 12
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published 26 days ago • 34
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 18 days ago • 15
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published 17 days ago • 20
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published 21 days ago • 17
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 17 days ago • 19
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark Paper • 2504.16427 • Published 17 days ago • 17
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 9 days ago • 43
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published 8 days ago • 39
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published 9 days ago • 21
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Paper • 2504.21117 • Published 10 days ago • 24
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published 4 days ago • 17
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published 10 days ago • 73
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published 3 days ago • 81
Improving Editability in Image Generation with Layer-wise Memory Paper • 2505.01079 • Published 8 days ago • 26
Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents Paper • 2505.02156 • Published 5 days ago • 17
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published 4 days ago • 60
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published 7 days ago • 30
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey Paper • 2505.03418 • Published 3 days ago • 5
Multi-Agent System for Comprehensive Soccer Understanding Paper • 2505.03735 • Published 3 days ago • 16
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer Paper • 2505.04622 • Published 2 days ago • 18