pdjohn
/

C-EBERT-210m

Token Classification

Model card Files Files and versions Community

C-EBERT-210m / README.md

pdjohn's picture

Update README.md

107300c verified about 2 months ago

|

history blame contribute delete

1.35 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- de
	base_model:
	- EuroBERT/EuroBERT-210m
	pipeline_tag: token-classification
	---

	# C-EBERT

	C-EBERT is a multi-task fine-tuned German EuroBERT to extract causal attribution.

	## Model details
	- Model architecture: EuroBERT-210m + token & relation heads
	- Fine-tuned on: environmental causal attribution corpus (German)
	- Tasks:
	1. Token classification (BIO tags for INDICATOR / ENTITY)
	2. Relation classification (CAUSE, EFFECT, INTERDEPENDENCY)

	## Usage
	Find the custom [library](https://github.com/padjohn/causalbert). Once installed, run inference like so:
	```python
	from transformers import AutoTokenizer
	from causalbert.infer import load_model, analyze_sentence_with_confidence

	model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
	result = analyze_sentence_with_confidence(
	model, tokenizer, config, "Autoverkehr verursacht Bienensterben.", []
	)
	```

	## Training

	- Base model: `EuroBERT/EuroBERT-210m`
	- Epochs: 3, LR: 2e-5, Batch size: 8
	- See [train.py](https://github.com/padjohn/causalbert/blob/main/causalbert/train.py) for details.

	## Limitations

	- German only.
	- Sentence-level; doesn’t handle cross-sentence causality.
	- Relation classification depends on detected spans — errors in token tagging propagate.