adaamko's picture
Upload model files
52dc2fd verified
|
raw
history blame
1.46 kB
metadata
{}

chiliground-base-modernbert-v1

A sentence classification model for extracting relevant spans from documents based on a question.

Model Details

  • Base model: answerdotai/ModernBERT-base
  • Hidden dimension: 768
  • Number of labels: 2
  • Best validation F1: 0.7038
  • Saved on: 2025-03-29 19:17:14

Usage

from transformers import AutoTokenizer
from verbatim_rag.extractor_models.model import QAModel
from verbatim_rag.extractors import ModelSpanExtractor
from verbatim_rag.document import Document

# Initialize the extractor
extractor = ModelSpanExtractor(
    model_path="chiliground-base-modernbert-v1",
    threshold=0.5
)

# Create documents
documents = [
    Document(
        content="Climate change is a significant issue. Rising sea levels threaten coastal areas.",
        metadata={"source": "example"}
    )
]

# Extract relevant spans
question = "What are the effects of climate change?"
results = extractor.extract_spans(question, documents)

# Print the results
for doc_content, spans in results.items():
    for span in spans:
        print(f"- {span}")

Training Data

This model was trained on a QA dataset to classify sentences as relevant or not relevant to a given question.

Limitations

  • The model works at the sentence level and may miss relevant spans that cross sentence boundaries
  • Performance depends on the quality and relevance of the training data
  • The model is designed for English text only