dleemiller
/

ModernCE-base-nli

+---
+license: mit
+datasets:
+- nyu-mll/multi_nli
+- stanfordnlp/snli
+language:
+- en
+metrics:
+- spearmanr
+- pearsonr
+base_model:
+- answerdotai/ModernBERT-base
+pipeline_tag: text-classification
+library_name: sentence-transformers
+tags:
+- cross-encoder
+- modernbert
+- mnli
+- snli
+---
+# ModernBERT Cross-Encoder: Natural Language Inference (NLI)
+This cross encoder performs sequence classification for contradiction/neutral/entailment labels.
+I trained this model by initializaing the ModernBERT-base weights from the brilliant `tasksource/ModernBERT-base-nli`
+zero-shot classification model. Then I trained it with a batch size of 64 using the `sentence-transformers` AllNLI
+dataset.
+---
+## Features
+- **High performing:** Achieves 90.34% and 90.25% on MNLI mismatched and SNLI test.
+- **Efficient architecture:** Based on the ModernBERT-base design (149M parameters), offering faster inference speeds.
+- **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals.
+---
+## Performance
+| Model                     | MNLI Mismatched   | SNLI Test    | Context Length |
+|---------------------------|-------------------|--------------|----------------|
+| `ModernCE-base-nli`       | 0.9034            | 0.9025       | 8192           |
+| `deberta-v3-large`        | 0.9049            | 0.9220       | 512            |
+| `deberta-v3-base`         | 0.9004            | 0.9234       | 512            |
+---
+## Usage
+To use ModernCE for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library:
+```python
+from sentence_transformers import CrossEncoder
+# Load ModernCE model
+model = CrossEncoder("dleemiller/ModernCE-base-nli")
+scores = model.predict([
+    ('A man is eating pizza', 'A man eats something'),
+    ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')
+])
+# Convert scores to labels
+label_mapping = ['contradiction', 'entailment', 'neutral']
+labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
+```
+---
+## Training Details
+### Pretraining
+We initialize the `tasksource/ModernBERT-base` weights.
+Details:
+- Batch size: 64
+- Learning rate: 3e-4
+- **Attention Dropout:** attention dropout 0.1
+### Fine-Tuning
+Fine-tuning was performed on the SBERT AllNLI.tsv.gz dataset.
+### Validation Results
+The model achieved the following test set performance after fine-tuning:
+- **MNLI Unmatched:** 0.9034
+- **SNLI:** 0.9025
+---
+## Model Card
+- **Architecture:** ModernBERT-base
+- **Fine-Tuning Data:** `sentence-transformers` - AllNLI.tsv.gz
+---
+## Thank You
+Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
+We also thank the tasksource team for their work on zeroshot encoder models.
+---
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{moderncestsb2025,
+  author = {Miller, D. Lee},
+  title = {ModernCE NLI: An NLI cross encoder model},
+  year = {2025},
+  publisher = {Hugging Face Hub},
+  url = {https://huggingface.co/dleemiller/ModernCE-base-nli},
+}
+```
+---
+## License
+This model is licensed under the [MIT License](LICENSE).