πŸ—£οΈ QuoteCaster: Speaker-Aware Quote Encoder

QuoteCaster is a fine-tuned BERT-based model designed to encode dialogue quotes along with their surrounding context in order to identify or group quotes by speaker β€” even in stories the model has never seen before.

This encoder powers unsupervised or few-shot quote attribution by mapping similar speaking styles (with context) to nearby points in embedding space. Perfect for clustering or nearest-neighbor speaker inference tasks.


πŸ“¦ Model Details

  • Base model: bert-base-uncased
  • Trained with: Triplet Margin Loss
  • Objective: Pull quotes from the same speaker together, push different ones apart
  • Input: context [SEP] quote
  • Output: [CLS] embedding as a 768-dimensional vector

πŸ“Š Use Case

QuoteCaster is ideal for:

  • 🧠 Clustering quotes by speaker using KMeans or Agglomerative Clustering
  • πŸ” Zero-shot speaker inference on unseen stories
  • πŸ§ͺ Dialogue structure analysis in novels, scripts, or plays

πŸš€ Example: Inference with QuoteCaster

from transformers import AutoModel, AutoTokenizer

# Load fine-tuned encoder
model = AutoModel.from_pretrained("aNameNobodyChose/quote-caster-encoder")
tokenizer = AutoTokenizer.from_pretrained("aNameNobodyChose/quote-caster-encoder")

# Encode a quote with its surrounding context
def encode_quote(context, quote):
    text = f"{context} [SEP] {quote}"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :]  # [CLS] token
Downloads last month
16
Safetensors
Model size
109M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using aNameNobodyChose/quote-caster-encoder 1