|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
language: |
|
- de |
|
base_model: |
|
- EuroBERT/EuroBERT-210m |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# C-EBERT |
|
|
|
C-EBERT is a multi-task fine-tuned German EuroBERT to extract causal attribution. |
|
|
|
## Model details |
|
- **Model architecture**: EuroBERT-210m + token & relation heads |
|
- **Fine-tuned on**: environmental causal attribution corpus (German) |
|
- **Tasks**: |
|
1. Token classification (BIO tags for INDICATOR / ENTITY) |
|
2. Relation classification (CAUSE, EFFECT, INTERDEPENDENCY) |
|
|
|
## Usage |
|
Find the custom [library](https://github.com/padjohn/causalbert). Once installed, run inference like so: |
|
```python |
|
from transformers import AutoTokenizer |
|
from causalbert.infer import load_model, analyze_sentence_with_confidence |
|
|
|
model, tokenizer, config, device = load_model("pdjohn/C-EBERT") |
|
result = analyze_sentence_with_confidence( |
|
model, tokenizer, config, "Autoverkehr verursacht Bienensterben.", [] |
|
) |
|
``` |
|
|
|
## Training |
|
|
|
- **Base model**: `EuroBERT/EuroBERT-210m` |
|
- **Epochs**: 3, **LR**: 2e-5, **Batch size**: 8 |
|
- See [train.py](https://github.com/padjohn/causalbert/blob/main/causalbert/train.py) for details. |
|
|
|
## Limitations |
|
|
|
- German only. |
|
- Sentence-level; doesn’t handle cross-sentence causality. |
|
- Relation classification depends on detected spans — errors in token tagging propagate. |