Abstract
How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-efficiency. Notably, Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model. This minimalist approach produces models that achieve reasoning performance which is competitive with, and sometimes surpasses, SOTA RL reasoning models built upon the same base model. Crucially, this is achieved at a tiny fraction of the computational post-training cost employed by existing SOTA models. In fact, the best Tina model achieves a >20\% reasoning performance increase and 43.33\% Pass@1 accuracy on AIME24, at only \$9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA. We validate this across multiple open-source reasoning datasets and various ablation settings starting with a single, fixed set of hyperparameters. Furthermore, we hypothesize that this effectiveness and efficiency stem from LoRA rapidly adapting the model to the structural format of reasoning rewarded by RL, while largely preserving the base model's underlying knowledge. In service of accessibility and open research, we fully open-source all code, training logs, and model weights \& checkpoints.
Community
Want strong language model reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! 🚀 We release the Tina family of models, created by post-training the DeepSeek-R1-Distill-Qwen-1.5B base model using low-rank adaptation (LoRA) during reinforcement learning (RL).
🤩 Our Tina models compete with, and sometimes surpass, SOTA models built on the same base model with a surprising high cost efficiency. With minimal post-training compute, the best Tina checkpoint achieves >20% performance increase over the base model and 43% Pass@1 on AIME24. And, it costs only $9 to reproduce the best Tina checkpoint, $526 to reproduce all our experiments from scratch!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't (2025)
- MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning (2025)
- NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning (2025)
- FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models (2025)
- GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models (2025)
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (2025)
- How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper