File size: 1,964 Bytes
d63e02b
 
d683911
 
 
 
 
 
d63e02b
 
70a8f79
4c7688b
d63e02b
d683911
4c7688b
 
 
 
 
 
d63e02b
d683911
def7e3d
d683911
4c7688b
d63e02b
4c7688b
2cb71d2
d63e02b
4c7688b
 
d63e02b
4c7688b
 
 
 
 
 
 
d63e02b
4c7688b
 
 
 
 
d63e02b
4c7688b
 
ca7c609
9bbe147
4c7688b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
library_name: transformers
license: apache-2.0
language:
- de
base_model:
- EuroBERT/EuroBERT-210m
pipeline_tag: token-classification
---

# C-EBERT
A multi-task model to extract **causal attribution** from German texts.

## Model details
- **Model architecture**: [EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) with two custom classification heads (one for token span and one for relation).
- **Fine-tuned on**: A custom corpus focused on environmental causal attribution in German.  
| Task | Output Type | Labels / Classes |
| :--- | :--- | :--- |
| **1. Token Classification** | Sequence Labeling (BIO) | **5 Span Labels** (O, B-INDICATOR, I-INDICATOR, B-ENTITY, I-ENTITY) |
| **2. Relation Classification** | Sentence-Pair Classification | **14 Relation Labels** (e.g., MONO\_POS\_CAUSE, DIST\_NEG\_EFFECT, INTERDEPENDENCY, NO\_RELATION) |

## Usage
Find the custom [library](https://github.com/padjohn/cbert). Once installed, run inference like so:
```python
from causalbert.infer import load_model, sentence_analysis

# NOTE: The model path accepts either a local directory or a Hugging Face Hub ID.
model, tokenizer, config, device = load_model("pdjohn/C-EBERT")

# Analyze a batch of sentences
sentences = ["Autoverkehr verursacht Bienensterben.", "Lärm ist der Grund für Stress."]

all_results = sentence_analysis(
    model, 
    tokenizer, 
    config, 
    sentences, 
    batch_size=8
)

# The result is a list of dictionaries containing token_predictions and derived_relations.
print(all_results[0]['derived_relations'])
# Example Output:
# [(['Autoverkehr', 'verursacht'], ['Bienensterben']), {'label': 'MONO_POS_CAUSE', 'confidence': 0.954}]
```

# Training
- Base model: EuroBERT/EuroBERT-210m
- Training Parameters:
  - Epochs: 10
  - Learning Rate: 1e-4
  - Batch size: 32
  - PEFT/LoRA: Enabled with r = 16
See [train.py](https://github.com/padjohn/cbert/blob/main/causalbert/train.py) for the full configuration details.