
Open RL Leaderboard
AI & ML interests
None defined yet.
open-rl-leaderboard's activity

Aurelien-Morgan
posted
an
update
4 days ago

clefourrier
posted
an
update
27 days ago
Post
683
Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨
If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!
An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!
The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)
Blog: HuggingFaceFW/blogpost-fine-tasks
✨how to select training evals with the highest signal✨
If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!
An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!
The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)
Blog: HuggingFaceFW/blogpost-fine-tasks

Aurelien-Morgan
posted
an
update
about 1 month ago
Post
398
Hey, I'll be presenting
@retrain-pipelines
and almighty function-calling at the Hugging Face Paris HQ, you guys.
Monday evening. Lightning-talk style. With AI Tinkerers.
Come hang !
https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
Monday evening. Lightning-talk style. With AI Tinkerers.
Come hang !
https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

Aurelien-Morgan
posted
an
update
about 2 months ago
Post
3136
The Almighty function-caller
How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?
We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.
Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?
We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.
Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

Aurelien-Morgan
posted
an
update
about 2 months ago
Post
664
retrain-pipelines 0.1.2
finally dropped. It comes with a hot Hugging Face Hub integration. Go check it out. We have 2 articles about it coming up. One already fully written so, be on the lookout !@retrain-pipelines
Also, I'll be volunteering at GOSIM AI Paris 2025. If you're interested in chatting, hmu.

ClementRomac
authored
4
papers
2 months ago
Meta Automatic Curriculum Learning
Paper
•
2011.08463
•
Published
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
Paper
•
2410.12481
•
Published
MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
Paper
•
2502.07709
•
Published
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Paper
•
2410.19920
•
Published

Aurelien-Morgan
posted
an
update
3 months ago

clefourrier
posted
an
update
3 months ago
Post
2497
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.

clefourrier
authored
a
paper
4 months ago

qgallouedec
updated
a
dataset
6 months ago

clefourrier
authored
a
paper
6 months ago

Aurelien-Morgan
posted
an
update
7 months ago
Post
507
I just shipped
I'll have to focus on another project for the next couple weeks but, anyone feel free to open issues on the GitHub repo and discuss any interest you'd have there if you will (please?) !
In the meantime, you may enjoy retrying this :
https://huggingface.co/blog/Aurelien-Morgan/stateful-metaflow-on-colab
retrain-pipelines 0.1.1
today. The doc is also pimped compared to previous release. That was clearly not mature then.I'll have to focus on another project for the next couple weeks but, anyone feel free to open issues on the GitHub repo and discuss any interest you'd have there if you will (please?) !
In the meantime, you may enjoy retrying this :
https://huggingface.co/blog/Aurelien-Morgan/stateful-metaflow-on-colab

Aurelien-Morgan
posted
an
update
8 months ago
Post
560
I just published the first article in a pair. I could make it a longer tailed series, in case you liked em. This one dives into self-hosting Metaflow without needing S3, illustrated with a version tailored for Google Colab.
find it @ https://huggingface.co/blog/Aurelien-Morgan/stateful-metaflow-on-colab
find it @ https://huggingface.co/blog/Aurelien-Morgan/stateful-metaflow-on-colab

clefourrier
authored
2
papers
12 months ago

ClementRomac
authored
a
paper
about 1 year ago

clefourrier
posted
an
update
about 1 year ago
Post
6146
In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸
It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.
This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.
openlifescienceai/open_medical_llm_leaderboard
Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm
It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.
This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.
openlifescienceai/open_medical_llm_leaderboard
Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm