arxiv:2503.23383

ToRL: Scaling Tool-Integrated RL

Published on Mar 30

Authors:

Abstract

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to explore and discover optimal strategies for tool use. Experiments with Qwen2.5-Math models show significant improvements: ToRL-7B reaches 43.3\% accuracy on AIME~24, surpassing reinforcement learning without tool integration by 14\% and the best existing Tool-Integrated Reasoning (TIR) model by 17\%. Further analysis reveals emergent behaviors such as strategic tool invocation, self-regulation of ineffective code, and dynamic adaptation between computational and analytical reasoning, all arising purely through reward-driven learning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2503.23383 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2503.23383 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2503.23383 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.