Update README.md
Browse files
README.md
CHANGED
@@ -108,6 +108,20 @@ model-index:
|
|
108 |
|
109 |
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [nli](https://huggingface.co/datasets/sentence-transformers/all-nli), [quora](https://huggingface.co/datasets/sentence-transformers/quora-duplicates), [natural_questions](https://huggingface.co/datasets/sentence-transformers/natural-questions), [stsb](https://huggingface.co/datasets/sentence-transformers/stsb), [sentence_compression](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [simple_wiki](https://huggingface.co/datasets/sentence-transformers/simple-wiki), [altlex](https://huggingface.co/datasets/sentence-transformers/altlex), [coco_captions](https://huggingface.co/datasets/sentence-transformers/coco-captions), [flickr30k_captions](https://huggingface.co/datasets/sentence-transformers/flickr30k-captions), [yahoo_answers](https://huggingface.co/datasets/sentence-transformers/yahoo-answers) and [stack_exchange](https://huggingface.co/datasets/sentence-transformers/stackexchange-duplicates) datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
110 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
## Model Details
|
112 |
|
113 |
### Model Description
|
|
|
108 |
|
109 |
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [nli](https://huggingface.co/datasets/sentence-transformers/all-nli), [quora](https://huggingface.co/datasets/sentence-transformers/quora-duplicates), [natural_questions](https://huggingface.co/datasets/sentence-transformers/natural-questions), [stsb](https://huggingface.co/datasets/sentence-transformers/stsb), [sentence_compression](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [simple_wiki](https://huggingface.co/datasets/sentence-transformers/simple-wiki), [altlex](https://huggingface.co/datasets/sentence-transformers/altlex), [coco_captions](https://huggingface.co/datasets/sentence-transformers/coco-captions), [flickr30k_captions](https://huggingface.co/datasets/sentence-transformers/flickr30k-captions), [yahoo_answers](https://huggingface.co/datasets/sentence-transformers/yahoo-answers) and [stack_exchange](https://huggingface.co/datasets/sentence-transformers/stackexchange-duplicates) datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
110 |
|
111 |
+
This model is based on the wide architecture of [johnnyboycurtis/ModernBERT-small](https://huggingface.co/johnnyboycurtis/ModernBERT-small)
|
112 |
+
|
113 |
+
```
|
114 |
+
small_modernbert_config = ModernBertConfig(
|
115 |
+
hidden_size=384, # A common dimension for small embedding models
|
116 |
+
num_hidden_layers=12, # Significantly fewer layers than the base's 22
|
117 |
+
num_attention_heads=6, # Must be a divisor of hidden_size
|
118 |
+
intermediate_size=1536, # 4 * hidden_size -- VERY WIDE!!
|
119 |
+
max_position_embeddings=1024, # Max sequence length for the model; originally 8192
|
120 |
+
)
|
121 |
+
|
122 |
+
model = ModernBertModel(modernbert_small_config)
|
123 |
+
```
|
124 |
+
|
125 |
## Model Details
|
126 |
|
127 |
### Model Description
|