LionGuard 2

LionGuard 2 is a multilingual content moderation classifier tuned for English/Singlish, Chinese, Malay, and Tamil in the Singapore context.

It leverages OpenAI’s text-embedding-3-large with a multi-head classifier to return fine-grained scores for the following categories:

Overall safety (binary)
Hate (hateful_l1, hateful_l2)
Insults (insults)
Sexual content (sexual_l1, sexual_l2)
Physical violence (physical_violence)
Self-harm (self_harm_l1, self_harm_l2)
Other misconduct (all_other_misconduct_l1, all_other_misconduct_l2)

Taxonomy

S/N	Category	Level	Description	Example
1	Hate	1 (Discriminatory)	Derogatory or generalized negative statements targeting a protected group.	“All Chinese people are so scheming.” “Malays are always late and lazy.” “Indians are too noisy; I avoid them whenever possible.”
		2 (Hate Speech)	Explicit calls for harm or violence against a protected group.	“We should kick all Chinese people out of the country.” “Burn down their temple…” “Kill the Indians in Little India; they don’t belong here.”
2	Insults	no sub-levels	Demeans, humiliates, or mocks without referencing a protected trait.	“You’re so dumb, you probably failed your PSLE.” “Your singing is worse than karaoke at a dodgy KTV.”
3	Sexual	1 (Not appropriate for minors)	Mild-to-moderate sexual content.	“Let’s talk about how condoms work.” “We had a one-night stand last weekend.”
		2 (Not appropriate for all ages)	Explicit or graphic sexual content.	“Always think they very yandao … but actually is just one ** bin.”
4	Physical Violence	no sub-levels	Glorification of violence or threats of physical harm.	“If I see him at Clarke Quay, I’m going to whack him.” “Someone should plant a bomb at that office…”
5	Self-Harm	1 (Ideation)	Expressions of suicidal thoughts or encouragement of self-harm.	“I’m so stressed … I feel like ending it all.” “Failing my poly exams made me want to cut myself.”
		2 (Action / Suicide)	Descriptions of ongoing or imminent self-harm behavior.	“I’ve locked myself in my room and taken a bunch of pills.” “I’m on the rooftop at Toa Payoh, ready to jump.”
6	All Other Misconduct	1 (Generally not socially accepted)	Unethical or immoral behavior not necessarily illegal.	“Let’s spread fake rumours about her …” “How to secretly record someone’s private conversation?”
		2 (Illegal activities)	Instructions or credible threats of serious harm; facilitation of crimes.	“Anyone know where to buy illegal knives in Geylang?” “Let’s hack that e-commerce site to get credit card details.”

Usage

import os
import numpy as np
from transformers import AutoModel
from openai import OpenAI

# Load model directly from HF
model = AutoModel.from_pretrained(
    "govtech/lionguard-2", 
    trust_remote_code=True
    )

# Get OpenAI embeddings (users to input their own OpenAI API key)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.embeddings.create(
    input="Hello, world!", # users to input their own text
    model="text-embedding-3-large",
    dimensions=1536 # dimensions of the embedding
    )
embeddings = np.array([data.embedding for data in response.data])

# Run LionGuard 2
results = model.predict(embeddings)

govtech
/

lionguard-2

LionGuard 2

Taxonomy

Usage

Space using govtech/lionguard-2 1

Collection including govtech/lionguard-2

LionGuard 2 🦁 ✌️