dleemiller commited on
Commit
06eaaaf
·
verified ·
1 Parent(s): 84173fd

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - nyu-mll/multi_nli
5
+ - stanfordnlp/snli
6
+ language:
7
+ - en
8
+ metrics:
9
+ - spearmanr
10
+ - pearsonr
11
+ base_model:
12
+ - answerdotai/ModernBERT-base
13
+ pipeline_tag: text-classification
14
+ library_name: sentence-transformers
15
+ tags:
16
+ - cross-encoder
17
+ - modernbert
18
+ - mnli
19
+ - snli
20
+ ---
21
+ # ModernBERT Cross-Encoder: Natural Language Inference (NLI)
22
+
23
+ This cross encoder performs sequence classification for contradiction/neutral/entailment labels.
24
+
25
+ I trained this model by initializaing the ModernBERT-base weights from the brilliant `tasksource/ModernBERT-base-nli`
26
+ zero-shot classification model. Then I trained it with a batch size of 64 using the `sentence-transformers` AllNLI
27
+ dataset.
28
+
29
+ ---
30
+
31
+ ## Features
32
+ - **High performing:** Achieves 90.34% and 90.25% on MNLI mismatched and SNLI test.
33
+ - **Efficient architecture:** Based on the ModernBERT-base design (149M parameters), offering faster inference speeds.
34
+ - **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals.
35
+
36
+ ---
37
+
38
+ ## Performance
39
+
40
+ | Model | MNLI Mismatched | SNLI Test | Context Length |
41
+ |---------------------------|-------------------|--------------|----------------|
42
+ | `ModernCE-base-nli` | 0.9034 | 0.9025 | 8192 |
43
+ | `deberta-v3-large` | 0.9049 | 0.9220 | 512 |
44
+ | `deberta-v3-base` | 0.9004 | 0.9234 | 512 |
45
+
46
+
47
+ ---
48
+
49
+ ## Usage
50
+
51
+ To use ModernCE for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library:
52
+
53
+ ```python
54
+ from sentence_transformers import CrossEncoder
55
+
56
+ # Load ModernCE model
57
+ model = CrossEncoder("dleemiller/ModernCE-base-nli")
58
+
59
+ scores = model.predict([
60
+ ('A man is eating pizza', 'A man eats something'),
61
+ ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')
62
+ ])
63
+
64
+ # Convert scores to labels
65
+ label_mapping = ['contradiction', 'entailment', 'neutral']
66
+ labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Training Details
72
+
73
+ ### Pretraining
74
+ We initialize the `tasksource/ModernBERT-base` weights.
75
+
76
+ Details:
77
+ - Batch size: 64
78
+ - Learning rate: 3e-4
79
+ - **Attention Dropout:** attention dropout 0.1
80
+
81
+ ### Fine-Tuning
82
+ Fine-tuning was performed on the SBERT AllNLI.tsv.gz dataset.
83
+
84
+ ### Validation Results
85
+ The model achieved the following test set performance after fine-tuning:
86
+ - **MNLI Unmatched:** 0.9034
87
+ - **SNLI:** 0.9025
88
+
89
+ ---
90
+
91
+ ## Model Card
92
+
93
+ - **Architecture:** ModernBERT-base
94
+ - **Fine-Tuning Data:** `sentence-transformers` - AllNLI.tsv.gz
95
+
96
+ ---
97
+
98
+ ## Thank You
99
+
100
+ Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
101
+ We also thank the tasksource team for their work on zeroshot encoder models.
102
+
103
+ ---
104
+
105
+ ## Citation
106
+
107
+ If you use this model in your research, please cite:
108
+
109
+ ```bibtex
110
+ @misc{moderncestsb2025,
111
+ author = {Miller, D. Lee},
112
+ title = {ModernCE NLI: An NLI cross encoder model},
113
+ year = {2025},
114
+ publisher = {Hugging Face Hub},
115
+ url = {https://huggingface.co/dleemiller/ModernCE-base-nli},
116
+ }
117
+ ```
118
+
119
+ ---
120
+
121
+ ## License
122
+
123
+ This model is licensed under the [MIT License](LICENSE).