Article
Javad Taghia PRO
telcom
AI & ML interests
text-to-image, image-to-image
Unlearning, training models, model evaluation and safety/alignment benchmarking
PhD - UNSW Sydney ( alumni )
Recent Activity
posted
an
update
about 7 hours ago
MAD-GRPO: https://huggingface.co/blog/telcom/mad-grpo
In R1-Zero-Like Training *, Dr.GRPO treats GRPO’s by dropping std, but that often comes with a hidden side effect: length-weighted updates that can nudge model toward verbosity.
MAD-GRPO provides robust scale (MAD + epsilon) per-token normalization stability without verbosity bias.
*https://huggingface.co/papers/2503.20783
commented on
a paper
about 7 hours ago
Understanding R1-Zero-Like Training: A Critical Perspective