LionGuard 2 🦁 ✌️
Collection
3 items
•
Updated
LionGuard 2 is a multilingual content moderation classifier tuned for English/Singlish, Chinese, Malay, and Tamil in the Singapore context.
It leverages OpenAI’s text-embedding-3-large
with a multi-head classifier to return fine-grained scores for the following categories:
binary
)hateful_l1
, hateful_l2
)insults
)sexual_l1
, sexual_l2
)physical_violence
)self_harm_l1
, self_harm_l2
)all_other_misconduct_l1
, all_other_misconduct_l2
)S/N | Category | Level | Description | Example |
---|---|---|---|---|
1 | Hate | 1 (Discriminatory) | Derogatory or generalized negative statements targeting a protected group. | “All Chinese people are so scheming.” “Malays are always late and lazy.” “Indians are too noisy; I avoid them whenever possible.” |
2 (Hate Speech) | Explicit calls for harm or violence against a protected group. | “We should kick all Chinese people out of the country.” “Burn down their temple…” “Kill the Indians in Little India; they don’t belong here.” |
||
2 | Insults | no sub-levels | Demeans, humiliates, or mocks without referencing a protected trait. | “You’re so dumb, you probably failed your PSLE.” “Your singing is worse than karaoke at a dodgy KTV.” |
3 | Sexual | 1 (Not appropriate for minors) | Mild-to-moderate sexual content. | “Let’s talk about how condoms work.” “We had a one-night stand last weekend.” |
2 (Not appropriate for all ages) | Explicit or graphic sexual content. | “Always think they very yandao … but actually is just one ** bin.” | ||
4 | Physical Violence | no sub-levels | Glorification of violence or threats of physical harm. | “If I see him at Clarke Quay, I’m going to whack him.” “Someone should plant a bomb at that office…” |
5 | Self-Harm | 1 (Ideation) | Expressions of suicidal thoughts or encouragement of self-harm. | “I’m so stressed … I feel like ending it all.” “Failing my poly exams made me want to cut myself.” |
2 (Action / Suicide) | Descriptions of ongoing or imminent self-harm behavior. | “I’ve locked myself in my room and taken a bunch of pills.” “I’m on the rooftop at Toa Payoh, ready to jump.” |
||
6 | All Other Misconduct | 1 (Generally not socially accepted) | Unethical or immoral behavior not necessarily illegal. | “Let’s spread fake rumours about her …” “How to secretly record someone’s private conversation?” |
2 (Illegal activities) | Instructions or credible threats of serious harm; facilitation of crimes. | “Anyone know where to buy illegal knives in Geylang?” “Let’s hack that e-commerce site to get credit card details.” |
import os
import numpy as np
from transformers import AutoModel
from openai import OpenAI
# Load model directly from HF
model = AutoModel.from_pretrained(
"govtech/lionguard-2",
trust_remote_code=True
)
# Get OpenAI embeddings (users to input their own OpenAI API key)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.embeddings.create(
input="Hello, world!", # users to input their own text
model="text-embedding-3-large",
dimensions=1536 # dimensions of the embedding
)
embeddings = np.array([data.embedding for data in response.data])
# Run LionGuard 2
results = model.predict(embeddings)