Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -9,36 +9,46 @@ pipeline_tag: token-classification | |
| 9 | 
             
            ---
         | 
| 10 |  | 
| 11 | 
             
            # C-EBERT
         | 
| 12 | 
            -
             | 
| 13 | 
            -
            C-EBERT is a multi-task fine-tuned German EuroBERT to extract causal attribution.
         | 
| 14 |  | 
| 15 | 
             
            ## Model details
         | 
| 16 | 
            -
            - **Model architecture**: EuroBERT-210m  | 
| 17 | 
            -
            - **Fine-tuned on**: environmental causal attribution  | 
| 18 | 
            -
             | 
| 19 | 
            -
             | 
| 20 | 
            -
             | 
|  | |
| 21 |  | 
| 22 | 
             
            ## Usage
         | 
| 23 | 
             
            Find the custom [library](https://github.com/padjohn/causalbert). Once installed, run inference like so:
         | 
| 24 | 
             
            ```python
         | 
| 25 | 
            -
            from  | 
| 26 | 
            -
            from causalbert.infer import load_model, analyze_sentence_with_confidence
         | 
| 27 |  | 
|  | |
| 28 | 
             
            model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
         | 
| 29 | 
            -
            result = analyze_sentence_with_confidence(
         | 
| 30 | 
            -
                model, tokenizer, config, "Autoverkehr verursacht Bienensterben.", []
         | 
| 31 | 
            -
            )
         | 
| 32 | 
            -
            ```
         | 
| 33 |  | 
| 34 | 
            -
             | 
|  | |
| 35 |  | 
| 36 | 
            -
             | 
| 37 | 
            -
             | 
| 38 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
| 39 |  | 
| 40 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
| 41 |  | 
| 42 | 
            -
             | 
| 43 | 
            -
            -  | 
| 44 | 
            -
            -  | 
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 9 | 
             
            ---
         | 
| 10 |  | 
| 11 | 
             
            # C-EBERT
         | 
| 12 | 
            +
            A multi-task model to extract **causal attribution** from German texts.
         | 
|  | |
| 13 |  | 
| 14 | 
             
            ## Model details
         | 
| 15 | 
            +
            - **Model architecture**: [EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) with two custom classification heads (one for token span and one for relation).
         | 
| 16 | 
            +
            - **Fine-tuned on**: A custom corpus focused on environmental causal attribution in German.  
         | 
| 17 | 
            +
            | Task | Output Type | Labels / Classes |
         | 
| 18 | 
            +
            | :--- | :--- | :--- |
         | 
| 19 | 
            +
            | **1. Token Classification** | Sequence Labeling (BIO) | **5 Span Labels** (O, B-INDICATOR, I-INDICATOR, B-ENTITY, I-ENTITY) |
         | 
| 20 | 
            +
            | **2. Relation Classification** | Sentence-Pair Classification | **14 Relation Labels** (e.g., MONO\_POS\_CAUSE, DIST\_NEG\_EFFECT, INTERDEPENDENCY, NO\_RELATION) |
         | 
| 21 |  | 
| 22 | 
             
            ## Usage
         | 
| 23 | 
             
            Find the custom [library](https://github.com/padjohn/causalbert). Once installed, run inference like so:
         | 
| 24 | 
             
            ```python
         | 
| 25 | 
            +
            from causalbert.infer import load_model, sentence_analysis
         | 
|  | |
| 26 |  | 
| 27 | 
            +
            # NOTE: The model path accepts either a local directory or a Hugging Face Hub ID.
         | 
| 28 | 
             
            model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
         | 
|  | |
|  | |
|  | |
|  | |
| 29 |  | 
| 30 | 
            +
            # Analyze a batch of sentences
         | 
| 31 | 
            +
            sentences = ["Autoverkehr verursacht Bienensterben.", "Lärm ist der Grund für Stress."]
         | 
| 32 |  | 
| 33 | 
            +
            all_results = sentence_analysis(
         | 
| 34 | 
            +
                model, 
         | 
| 35 | 
            +
                tokenizer, 
         | 
| 36 | 
            +
                config, 
         | 
| 37 | 
            +
                sentences, 
         | 
| 38 | 
            +
                batch_size=8
         | 
| 39 | 
            +
            )
         | 
| 40 |  | 
| 41 | 
            +
            # The result is a list of dictionaries containing token_predictions and derived_relations.
         | 
| 42 | 
            +
            print(all_results[0]['derived_relations'])
         | 
| 43 | 
            +
            # Example Output:
         | 
| 44 | 
            +
            # [(['Autoverkehr', 'verursacht'], ['Bienensterben']), {'label': 'MONO_POS_CAUSE', 'confidence': 0.954}]
         | 
| 45 | 
            +
            ```
         | 
| 46 |  | 
| 47 | 
            +
            # Training
         | 
| 48 | 
            +
            - Base model: EuroBERT/EuroBERT-210m
         | 
| 49 | 
            +
            - Training Parameters (Approx.):
         | 
| 50 | 
            +
              - Epochs: 8
         | 
| 51 | 
            +
              - Learning Rate: 1e-4
         | 
| 52 | 
            +
              - Batch size: 32
         | 
| 53 | 
            +
              - PEFT/LoRA: Enabled with r = 16
         | 
| 54 | 
            +
            See [train.py](https://github.com/padjohn/cbert/blob/main/causalbert/train.py) for the full configuration details.
         | 
