readme?

#1
by protobutter - opened

curious what this model is?

You can find the original model card under https://huggingface.co/eth-nlped/TutorRL-7B the official model card, their official GitHub repository available under https://github.com/eth-lre/PedagogicalRL or read their paper available under https://arxiv.org/abs/2505.15607. In case you wonder what it is here are some quotes from their model-card:

TutorRL-7B is a fine-tuned variant of Qwen/Qwen2.5-7B-Instruct, trained to act as a math tutor rather than a solver. It is aligned to pedagogical principles using reinforcement learning (GRPO) in a synthetic multi-turn classroom setting, without requiring any human-labeled data.

This model was developed as part of the research project From Problem-Solving to Teaching Problem-Solving, which proposes a scalable, annotation-free approach to training LLMs as educational tutors. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.

Here a copy of the picture they posted on their GitHub:

diagram.png

So nice to see a model from ETH Zürich. I once was at this exact university long before the current AI boom.

I was close to going to study there an even longer time ago :)

Sign up or log in to comment