C-EBERT-210m / README.md

pdjohn

Update README.md

107300c verified about 1 month ago

preview code

raw

history blame contribute delete

1.35 kB

metadata

library_name: transformers
license: apache-2.0
language:
  - de
base_model:
  - EuroBERT/EuroBERT-210m
pipeline_tag: token-classification

C-EBERT

C-EBERT is a multi-task fine-tuned German EuroBERT to extract causal attribution.

Model details

Model architecture: EuroBERT-210m + token & relation heads
Fine-tuned on: environmental causal attribution corpus (German)
Tasks:
1. Token classification (BIO tags for INDICATOR / ENTITY)
2. Relation classification (CAUSE, EFFECT, INTERDEPENDENCY)

Usage

Find the custom library. Once installed, run inference like so:

from transformers import AutoTokenizer
from causalbert.infer import load_model, analyze_sentence_with_confidence

model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
result = analyze_sentence_with_confidence(
    model, tokenizer, config, "Autoverkehr verursacht Bienensterben.", []
)

Training

Base model: EuroBERT/EuroBERT-210m
Epochs: 3, LR: 2e-5, Batch size: 8
See train.py for details.

Limitations

German only.
Sentence-level; doesn’t handle cross-sentence causality.
Relation classification depends on detected spans — errors in token tagging propagate.