π£οΈ QuoteCaster: Speaker-Aware Quote Encoder
QuoteCaster is a fine-tuned BERT-based model designed to encode dialogue quotes along with their surrounding context in order to identify or group quotes by speaker β even in stories the model has never seen before.
This encoder powers unsupervised or few-shot quote attribution by mapping similar speaking styles (with context) to nearby points in embedding space. Perfect for clustering or nearest-neighbor speaker inference tasks.
π¦ Model Details
- Base model:
bert-base-uncased
- Trained with: Triplet Margin Loss
- Objective: Pull quotes from the same speaker together, push different ones apart
- Input:
context [SEP] quote
- Output:
[CLS]
embedding as a 768-dimensional vector
π Use Case
QuoteCaster is ideal for:
- π§ Clustering quotes by speaker using KMeans or Agglomerative Clustering
- π Zero-shot speaker inference on unseen stories
- π§ͺ Dialogue structure analysis in novels, scripts, or plays
π Example: Inference with QuoteCaster
from transformers import AutoModel, AutoTokenizer
# Load fine-tuned encoder
model = AutoModel.from_pretrained("aNameNobodyChose/quote-caster-encoder")
tokenizer = AutoTokenizer.from_pretrained("aNameNobodyChose/quote-caster-encoder")
# Encode a quote with its surrounding context
def encode_quote(context, quote):
text = f"{context} [SEP] {quote}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
outputs = model(**inputs)
return outputs.last_hidden_state[:, 0, :] # [CLS] token
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support