pdjohn
/

C-EBERT-210m

Token Classification

Model card Files Files and versions

pdjohn commited on 15 days ago

Commit

4c7688b

·

verified ·

1 Parent(s): 121336b

Update README.md

Files changed (1) hide show

README.md +31 -21

README.md CHANGED Viewed

@@ -9,36 +9,46 @@ pipeline_tag: token-classification
 ---
 # C-EBERT
-C-EBERT is a multi-task fine-tuned German EuroBERT to extract causal attribution.
 ## Model details
-- **Model architecture**: EuroBERT-210m + token & relation heads
-- **Fine-tuned on**: environmental causal attribution corpus (German)
-- **Tasks**:
-  1. Token classification (BIO tags for INDICATOR / ENTITY)
-  2. Relation classification (CAUSE, EFFECT, INTERDEPENDENCY)
 ## Usage
 Find the custom [library](https://github.com/padjohn/causalbert). Once installed, run inference like so:
 ```python
-from transformers import AutoTokenizer
-from causalbert.infer import load_model, analyze_sentence_with_confidence
 model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
-result = analyze_sentence_with_confidence(
-    model, tokenizer, config, "Autoverkehr verursacht Bienensterben.", []
-)
-```
-## Training
-- **Base model**: `EuroBERT/EuroBERT-210m`
-- **Epochs**: 3, **LR**: 2e-5, **Batch size**: 8
-- See [train.py](https://github.com/padjohn/causalbert/blob/main/causalbert/train.py) for details.
-## Limitations
-- German only.
-- Sentence-level; doesn’t handle cross-sentence causality.
-- Relation classification depends on detected spans — errors in token tagging propagate.

 ---
 # C-EBERT
+A multi-task model to extract **causal attribution** from German texts.
 ## Model details
+- **Model architecture**: [EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) with two custom classification heads (one for token span and one for relation).
+- **Fine-tuned on**: A custom corpus focused on environmental causal attribution in German.
+| Task | Output Type | Labels / Classes |
+| :--- | :--- | :--- |
+| **1. Token Classification** | Sequence Labeling (BIO) | **5 Span Labels** (O, B-INDICATOR, I-INDICATOR, B-ENTITY, I-ENTITY) |
+| **2. Relation Classification** | Sentence-Pair Classification | **14 Relation Labels** (e.g., MONO\_POS\_CAUSE, DIST\_NEG\_EFFECT, INTERDEPENDENCY, NO\_RELATION) |
 ## Usage
 Find the custom [library](https://github.com/padjohn/causalbert). Once installed, run inference like so:
 ```python
+from causalbert.infer import load_model, sentence_analysis
+# NOTE: The model path accepts either a local directory or a Hugging Face Hub ID.
 model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
+# Analyze a batch of sentences
+sentences = ["Autoverkehr verursacht Bienensterben.", "Lärm ist der Grund für Stress."]
+all_results = sentence_analysis(
+    model,
+    tokenizer,
+    config,
+    sentences,
+    batch_size=8
+)
+# The result is a list of dictionaries containing token_predictions and derived_relations.
+print(all_results[0]['derived_relations'])
+# Example Output:
+# [(['Autoverkehr', 'verursacht'], ['Bienensterben']), {'label': 'MONO_POS_CAUSE', 'confidence': 0.954}]
+```
+# Training
+- Base model: EuroBERT/EuroBERT-210m
+- Training Parameters (Approx.):
+  - Epochs: 8
+  - Learning Rate: 1e-4
+  - Batch size: 32
+  - PEFT/LoRA: Enabled with r = 16
+See [train.py](https://github.com/padjohn/cbert/blob/main/causalbert/train.py) for the full configuration details.