|
--- |
|
{} |
|
--- |
|
# chiliground-base-modernbert-v1 |
|
|
|
A sentence classification model for extracting relevant spans from documents based on a question. |
|
|
|
## Model Details |
|
- Base model: answerdotai/ModernBERT-base |
|
- Hidden dimension: 768 |
|
- Number of labels: 2 |
|
- Best validation F1: 0.7038 |
|
- Saved on: 2025-03-29 19:17:14 |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
from verbatim_rag.extractor_models.model import QAModel |
|
from verbatim_rag.extractors import ModelSpanExtractor |
|
from verbatim_rag.document import Document |
|
|
|
# Initialize the extractor |
|
extractor = ModelSpanExtractor( |
|
model_path="chiliground-base-modernbert-v1", |
|
threshold=0.5 |
|
) |
|
|
|
# Create documents |
|
documents = [ |
|
Document( |
|
content="Climate change is a significant issue. Rising sea levels threaten coastal areas.", |
|
metadata={"source": "example"} |
|
) |
|
] |
|
|
|
# Extract relevant spans |
|
question = "What are the effects of climate change?" |
|
results = extractor.extract_spans(question, documents) |
|
|
|
# Print the results |
|
for doc_content, spans in results.items(): |
|
for span in spans: |
|
print(f"- {span}") |
|
``` |
|
|
|
## Training Data |
|
|
|
This model was trained on a QA dataset to classify sentences as relevant or not relevant to a given question. |
|
|
|
## Limitations |
|
|
|
- The model works at the sentence level and may miss relevant spans that cross sentence boundaries |
|
- Performance depends on the quality and relevance of the training data |
|
- The model is designed for English text only |
|
|