Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Abstract
Agent Lightning is a flexible RL framework for training LLMs in various agents, using a hierarchical RL algorithm and decoupling execution from training to handle complex interactions.
We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from scratch) with almost ZERO code modifications. By formulating agent execution as Markov decision process, we define an unified data interface and propose a hierarchical RL algorithm, LightningRL, which contains a credit assignment module, allowing us to decompose trajectories generated by ANY agents into training transition. This enables RL to handle complex interaction logic, such as multi-agent scenarios and dynamic workflows. For the system design, we introduce a Training-Agent Disaggregation architecture, and brings agent observability frameworks into agent runtime, providing a standardized agent finetuning interface. Experiments across text-to-SQL, retrieval-augmented generation, and math tool-use tasks demonstrate stable, continuous improvements, showcasing the framework's potential for real-world agent training and deployment.
Community
Agent Lightning: a flexible and extensible framework that enables seamless agent optimization for any agent
GitHub Repository
Paper on arXiv
Reddit Discussion (implementation details)
Additional Experiments (not in paper)
Agent Lightning is a framework that fully decouples agents from RL training, enabling flexible and extensible agent learning. This decoupling allows for:
🔌 Plug-and-Play with Diverse Agents
- Supports various agent implementations (e.g., LangChain, OpenAI Agents SDK, AutoGen, CrewAI, etc.); or even WITHOUT agent framework (Python OpenAI). You name it!
- Almost ZERO code change required on the agent side
🤖 Multi-Agent Training
- Train multiple agents simultaneously
- Freely select which agents to train
🛠️ Additional Optimizations
- Supports prompt tuning. More algorithms are comming!
đź”§ Design for Full Decoupling
To make the framework truly decoupled, we introduce the following key components:
1. Unified Data Interface (Based on Agent MDP)
- A general interface that works for any agent
- Data is organized at the transition level
- Credit assignment is done before single-turn model updates
- No accumulation of context across turns → no masking needed
- Highly flexible context (e.g., prompt, instruction, summary)
2. Training-Agent Disaggregation Architecture
- Implements a server–client architecture
- Uses observability tools like OpenTelemetry during runtime
- Enables real-time monitoring and error handling
âś… Case Studies
We applied Agent Lightning in the following scenarios, all showing stable reward improvement:
- Text-to-SQL via LangChain
- Retrieval-Augmented Generation via OpenAI Agents SDK
- Math QA with Tool Usage via AutoGen
We hope Agent Lightning can serve as a bridge across domains in the agent training ecosystem.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents (2025)
- L0: Reinforcement Learning to Become General Agents (2025)
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning (2025)
- MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning (2025)
- A Technical Survey of Reinforcement Learning Techniques for Large Language Models (2025)
- MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (2025)
- Agentic Reinforced Policy Optimization (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper