Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7 • 46
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL Paper • 2403.03950 • Published Mar 6 • 13
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches Paper • 2403.02709 • Published Mar 5 • 7