RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:
1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/ It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more
2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples
4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/ Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning
5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265 Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics
I’m fine-tuning Qwen 2.5-0.5B to be extremely good at math, using high-quality datasets and some smart training strategies. The logs are looking really promising so far!
Expected release: Tomorrow morning? I’ll post as soon as it’s ready — stay tuned.
If you want faster updates or just wanna chat about it, come join my Discord: https://discord.gg/EXsug2Ux29 (Heads up: we might ask a couple quick questions when you join — just making sure we keep the server safe.)
This project is also helping shape the future of IntellIte. The insights and techniques we’re developing here — better dataset curation, fine-tuning tricks, and evaluation methods — will directly contribute to making IntellIte even sharper, faster, and more reliable, especially for math and reasoning tasks.
Big progress ahead. Can’t wait to share it with you all!