Hakim
h4c5
AI & ML interests
None yet
Recent Activity
liked
a dataset
about 2 months ago
walledai/AdvBench
updated
a collection
about 2 months ago
moderation-prompts
liked
a dataset
about 2 months ago
HuggingFaceH4/ultrachat_200k
Organizations
Collections
4
-
mmathys/openai-moderation-api-evaluation
Viewer • Updated • 1.68k • 360 • 32 -
Anthropic/hh-rlhf
Viewer • Updated • 169k • 12.4k • 1.34k -
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Paper • 2406.18495 • Published • 13 -
ShieldGemma: Generative AI Content Moderation Based on Gemma
Paper • 2407.21772 • Published • 14
models
2
datasets
0
None public yet