🗣️ QuoteCaster: Speaker-Aware Quote Encoder

QuoteCaster is a fine-tuned BERT-based model designed to encode dialogue quotes along with their surrounding context in order to identify or group quotes by speaker — even in stories the model has never seen before.

This encoder powers unsupervised or few-shot quote attribution by mapping similar speaking styles (with context) to nearby points in embedding space. Perfect for clustering or nearest-neighbor speaker inference tasks.

📦 Model Details

Base model: bert-base-uncased
Trained with: Triplet Margin Loss
Objective: Pull quotes from the same speaker together, push different ones apart
Input: context [SEP] quote
Output: [CLS] embedding as a 768-dimensional vector

📊 Use Case

QuoteCaster is ideal for:

🧠 Clustering quotes by speaker using KMeans or Agglomerative Clustering
🔍 Zero-shot speaker inference on unseen stories
🧪 Dialogue structure analysis in novels, scripts, or plays

🚀 Example: Inference with QuoteCaster

from transformers import AutoModel, AutoTokenizer

# Load fine-tuned encoder
model = AutoModel.from_pretrained("aNameNobodyChose/quote-caster-encoder")
tokenizer = AutoTokenizer.from_pretrained("aNameNobodyChose/quote-caster-encoder")

# Encode a quote with its surrounding context
def encode_quote(context, quote):
    text = f"{context} [SEP] {quote}"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :]  # [CLS] token

aNameNobodyChose
/

quote-caster-encoder

🗣️ QuoteCaster: Speaker-Aware Quote Encoder

📦 Model Details

📊 Use Case

🚀 Example: Inference with QuoteCaster

Space using aNameNobodyChose/quote-caster-encoder 1