metadata
{}
chiliground-base-modernbert-v1
A sentence classification model for extracting relevant spans from documents based on a question.
Model Details
- Base model: answerdotai/ModernBERT-base
- Hidden dimension: 768
- Number of labels: 2
- Best validation F1: 0.7038
- Saved on: 2025-03-29 19:17:14
Usage
from transformers import AutoTokenizer
from verbatim_rag.extractor_models.model import QAModel
from verbatim_rag.extractors import ModelSpanExtractor
from verbatim_rag.document import Document
# Initialize the extractor
extractor = ModelSpanExtractor(
model_path="chiliground-base-modernbert-v1",
threshold=0.5
)
# Create documents
documents = [
Document(
content="Climate change is a significant issue. Rising sea levels threaten coastal areas.",
metadata={"source": "example"}
)
]
# Extract relevant spans
question = "What are the effects of climate change?"
results = extractor.extract_spans(question, documents)
# Print the results
for doc_content, spans in results.items():
for span in spans:
print(f"- {span}")
Training Data
This model was trained on a QA dataset to classify sentences as relevant or not relevant to a given question.
Limitations
- The model works at the sentence level and may miss relevant spans that cross sentence boundaries
- Performance depends on the quality and relevance of the training data
- The model is designed for English text only