Abstract
GuardReasoner-VL enhances VLM safety through a reasoning-based guard model trained with SFT and online RL, using diverse datasets and a length-aware safety reward.
To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the guard model to deliberatively reason before making moderation decisions via online RL. First, we construct GuardReasoner-VLTrain, a reasoning corpus with 123K samples and 631K reasoning steps, spanning text, image, and text-image inputs. Then, based on it, we cold-start our model's reasoning ability via SFT. In addition, we further enhance reasoning regarding moderation through online RL. Concretely, to enhance diversity and difficulty of samples, we conduct rejection sampling followed by data augmentation via the proposed safety-aware data concatenation. Besides, we use a dynamic clipping parameter to encourage exploration in early stages and exploitation in later stages. To balance performance and token efficiency, we design a length-aware safety reward that integrates accuracy, format, and token cost. Extensive experiments demonstrate the superiority of our model. Remarkably, it surpasses the runner-up by 19.27% F1 score on average. We release data, code, and models (3B/7B) of GuardReasoner-VL at https://github.com/yueliu1999/GuardReasoner-VL/
Community
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization (2025)
- MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning (2025)
- Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning (2025)
- SaRO: Enhancing LLM Safety through Reasoning-based Alignment (2025)
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning (2025)
- Video-R1: Reinforcing Video Reasoning in MLLMs (2025)
- VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper