|
--- |
|
license: mit |
|
datasets: |
|
- rungalileo/ragbench |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
base_model: |
|
- answerdotai/ModernBERT-base |
|
pipeline_tag: text-classification |
|
--- |
|
# ChiliGround - A verbatim RAG framework |
|
|
|
A sentence classification model for extracting relevant spans from documents based on a question. |
|
|
|
## Model Details |
|
- Base model: answerdotai/ModernBERT-base |
|
- Hidden dimension: 768 |
|
- Number of labels: 2 |
|
|
|
## Usage |
|
|
|
```python |
|
from verbatim_rag.extractors import ModelSpanExtractor |
|
from verbatim_rag.document import Document |
|
|
|
# Initialize the extractor |
|
extractor = ModelSpanExtractor( |
|
model_path="KRLabsOrg/chiliground-base-modernbert-v1", |
|
threshold=0.5 |
|
) |
|
|
|
# Create documents |
|
documents = [ |
|
Document( |
|
content=""" |
|
Climate change is a significant and lasting change in the statistical distribution of weather patterns. |
|
Global warming is the observed increase in the average temperature of the Earth's atmosphere and oceans. |
|
Greenhouse gases include water vapor, carbon dioxide, methane, nitrous oxide, and ozone. |
|
Human activities since the beginning of the Industrial Revolution have increased greenhouse gas levels. |
|
""", |
|
metadata={"source": "example_doc_1", "id": "climate_1"}, |
|
), |
|
Document( |
|
content=""" |
|
Renewable energy comes from sources that are naturally replenished on a human timescale. |
|
Solar power is the conversion of energy from sunlight into electricity. |
|
Wind power is the use of wind to provide mechanical power or electricity. |
|
Hydropower is electricity generated from the energy of falling water. |
|
""", |
|
metadata={"source": "example_doc_2", "id": "energy_1"}, |
|
), |
|
] |
|
|
|
|
|
# Extract relevant spans |
|
question = "What causes climate change?" |
|
results = extractor.extract_spans(question, documents) |
|
|
|
# Print the results |
|
for doc_content, spans in results.items(): |
|
for span in spans: |
|
print(span) |
|
``` |
|
|
|
## Training Data |
|
|
|
This model was trained on a QA dataset to classify sentences as relevant or not relevant to a given question. |
|
|
|
## Limitations |
|
|
|
- The model works at the sentence level and may miss relevant spans that cross sentence boundaries |
|
- Performance depends on the quality and relevance of the training data |
|
- The model is designed for English text only |