akhooli commited on
Commit
068bc33
โ€ข
1 Parent(s): 6ee92c9

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +87 -36
  2. model.safetensors +1 -1
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  base_model: aubmindlab/bert-base-arabertv02
3
  datasets: []
4
- language: [ar]
5
  library_name: sentence-transformers
6
  pipeline_tag: sentence-similarity
7
  tags:
@@ -9,19 +9,47 @@ tags:
9
  - sentence-similarity
10
  - feature-extraction
11
  - generated_from_trainer
12
- - dataset_size:10000
13
  - loss:MatryoshkaLoss
14
  - loss:MultipleNegativesRankingLoss
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- # Arabic SBERT
18
 
19
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02).
20
- It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search,
21
- paraphrase mining, text classification, clustering, and more.
22
-
23
- The model is based on a sample from the `akhooli/arabic-triplets-1m-curated-sims-len` dataset. This is an early test version. Do not use while the model name has the
24
- word `test`.
25
 
26
  ## Model Details
27
 
@@ -68,9 +96,9 @@ from sentence_transformers import SentenceTransformer
68
  model = SentenceTransformer("sentence_transformers_model_id")
69
  # Run inference
70
  sentences = [
71
- 'ุฃุณุจุงุจ ูƒุซุฑุฉ ุงู„ุชุจูˆู„',
72
- 'ุฃุณุจุงุจ ูƒุซุฑุฉ ุงู„ุชุจูˆู„. ูŠู…ูƒู† ุฃู† ูŠูƒูˆู† ุงู„ุชุจูˆู„ ุงู„ู…ุชูƒุฑุฑ ุฃุญุฏ ุฃุนุฑุงุถ ุงู„ุนุฏูŠุฏ ู…ู† ุงู„ู…ุดุงูƒู„ ุงู„ู…ุฎุชู„ูุฉ ู…ู† ุฃู…ุฑุงุถ ุงู„ูƒู„ู‰ ุฅู„ู‰ ู…ุฌุฑุฏ ุดุฑุจ ุงู„ูƒุซูŠุฑ ู…ู† ุงู„ุณูˆุงุฆู„. ุนู†ุฏู…ุง ูŠูƒูˆู† ุงู„ุชุจูˆู„ ุงู„ู…ุชูƒุฑุฑ ู…ุตุญูˆุจู‹ุง ุจุงู„ุญู…ู‰ ุŒ ูˆุงู„ุญุงุฌุฉ ุงู„ู…ู„ุญุฉ ู„ู„ุชุจูˆู„ ุŒ ูˆุงู„ุฃู„ู… ุฃูˆ ุนุฏู… ุงู„ุฑุงุญุฉ ููŠ ุงู„ุจุทู† ุŒ ูู‚ุฏ ูŠูƒูˆู† ู„ุฏูŠูƒ ุงู„ุชู‡ุงุจ ููŠ ุงู„ู…ุณุงู„ูƒ ุงู„ุจูˆู„ูŠุฉ.',
73
- 'ู…ู† ุงู„ุทุจูŠุนูŠ ุฃู† ูŠุชุจูˆู„ ุงู„ุจุงู„ุบูˆู† ุณุจุน ู…ุฑุงุช ุฎู„ุงู„ ุงู„ูŠูˆู…. ููŠ ุจุนุถ ุงู„ุญูŠูˆุงู†ุงุช ุŒ ุจุงู„ุฅุถุงูุฉ ุฅู„ู‰ ุทุฑุฏ ุงู„ู†ูุงูŠุงุช ุŒ ูŠู…ูƒู† ุฃู† ูŠุคุฏูŠ ุงู„ุชุจูˆู„ ุฅู„ู‰ ุชุญุฏูŠุฏ ุงู„ู…ู†ุทู‚ุฉ ุฃูˆ ุงู„ุชุนุจูŠุฑ ุนู† ุงู„ุฎุถูˆุน. ู…ู† ุงู„ู†ุงุญูŠุฉ ุงู„ูุณูŠูˆู„ูˆุฌูŠุฉ ุŒ ูŠุชุถู…ู† ุงู„ุชุจูˆู„ ุงู„ุชู†ุณูŠู‚ ุจูŠู† ุงู„ุฌู‡ุงุฒ ุงู„ุนุตุจูŠ ุงู„ู…ุฑูƒุฒูŠ ูˆุงู„ุฌู‡ุงุฒ ุงู„๏ฟฝ๏ฟฝุตุจูŠ ุงู„ู„ุงุฅุฑุงุฏูŠ ูˆุงู„ุฌุณุฏูŠ.',
74
  ]
75
  embeddings = model.encode(sentences)
76
  print(embeddings.shape)
@@ -125,19 +153,19 @@ You can finetune this model on your own dataset.
125
  #### Unnamed Dataset
126
 
127
 
128
- * Size: 10,000 training samples
129
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
130
  * Approximate statistics based on the first 1000 samples:
131
- | | anchor | positive | negative |
132
- |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
133
- | type | string | string | string |
134
- | details | <ul><li>min: 4 tokens</li><li>mean: 8.78 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 67.32 tokens</li><li>max: 187 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 67.49 tokens</li><li>max: 220 tokens</li></ul> |
135
  * Samples:
136
- | anchor | positive | negative |
137
- |:----------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
138
- | <code>ุงู„ู†ุธุฑูŠุฉ ุงู„ุฃุณุงุณูŠุฉ ู„ู„ุชุนุฑูŠู ุงู„ุญุณุงุจูŠ</code> | <code>ุงู„ู†ุธุฑูŠุฉ ุงู„ุฃุณุงุณูŠุฉ ููŠ ุงู„ุญุณุงุจ. ู…ู† ูˆูŠูƒูŠุจูŠุฏูŠุงุŒ ุงู„ู…ูˆุณูˆุนุฉ ุงู„ุญุฑุฉ. ุงู„ู†ุธุฑูŠุฉ ุงู„ุฃุณุงุณูŠุฉ ู„ู„ุฃุฎู„ุงู‚ ุงู„ุญุณุงุจูŠุฉ (ูˆุชุณู…ู‰ ุฃูŠุถู‹ุง ู†ุธุฑูŠุฉ ุงู„ุนูˆุงู…ู„ ุงู„ูุฑูŠุฏุฉ) ู‡ูŠ ู†ุธุฑูŠุฉ ู†ุธุฑูŠุฉ ุงู„ุฃุนุฏุงุฏ. ุชู‚ูˆู„ ุงู„ู†ุธุฑูŠุฉ ุฃู† ูƒู„ ุนุฏุฏ ุตุญูŠุญ ู…ูˆุฌุจ ุฃูƒุจุฑ ู…ู† 1 ูŠู…ูƒู† ูƒุชุงุจุชู‡ ูƒู…ู†ุชุฌ ู„ู„ุฃุนุฏุงุฏ ุงู„ุฃูˆู„ูŠุฉ (ุฃูˆ ุฃู† ุงู„ุนุฏุฏ ุงู„ุตุญูŠุญ ู‡ูˆ ู†ูุณู‡ ุนุฏุฏ ุฃูˆู„ูŠ).</code> | <code>ูŠุชู… ุชุนุฑูŠู ุงู„ุฃุณุงุณูŠ ุนู„ู‰ ุฃู†ู‡ ุดูŠุก ุฃุณุงุณูŠ ุฃูˆ ุฃุณุงุณูŠ. ุงู„ุญู‚ูŠู‚ุฉ ุงู„ุฃุณุงุณูŠุฉ ู„ู„ุฏูŠู† ู‡ูŠ ู…ุซุงู„ ู„ุญู‚ูŠู‚ุฉ ุฃุณุงุณูŠุฉ. ุชุนุฑูŠู ุงู„ุฃุณุงุณูŠ ู‡ูˆ ุญู‚ูŠู‚ุฉ ุฃุณุงุณูŠุฉ ุฃูˆ ู‚ุงู†ูˆู†. ุงู„ุญุฑูŠุฉ ู‡ูŠ ู…ุซุงู„ ุฃุณุงุณูŠ ู„ู„ู…ุซู„ ุงู„ุฃุนู„ู‰ ุงู„ุฃู…ุฑูŠูƒูŠ.</code> |
139
- | <code>ูƒูŠู ูŠุชู… ุชุดุฎูŠุต ุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ</code> | <code>ุชุดุฎูŠุต ุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ. ููŠ ุงู„ุญุงู„ุงุช ุงู„ู…ุดุชุจู‡ ููŠู‡ุง ู…ู† ุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ ุŒ ูŠุดุชู…ู„ ุงู„ุชุดุฎูŠุต ุนุงุฏุฉู‹ ุนู„ู‰ ู…ุฑุงุฌุนุฉ ุงู„ุชุงุฑูŠุฎ ุงู„ุทุจูŠ ู„ู„ู…ุฑูŠุถ ุŒ ูˆูุญุตู‹ุง ุจุฏู†ูŠู‹ุง ุŒ ูˆ (ููŠ ุจุนุถ ุงู„ุญุงู„ุงุช) ุงุฎุชุจุงุฑุงุช ู…ุนูŠู†ุฉ. ูƒุฌุฒุก ู…ู† ุชุดุฎูŠุต ุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ (ุงู„ู…ุนุฑูˆู ุฃูŠุถู‹ุง ุจุงุณู… ุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ) ุŒ ุณูŠุณุชุจุนุฏ ุงู„ุทุจูŠุจ ุฃูŠุถู‹ุง ุงู„ุฃู…ุฑุงุถ ุงู„ุฃุฎุฑู‰ ุŒ ู…ุซู„ ู†ุฒู„ุงุช ุงู„ุจุฑุฏ ูˆุงู„ุฅู†ูู„ูˆู†ุฒุง ูˆุงู„ุชู‡ุงุจ ุงู„ุดุนุจ ุงู„ู‡ูˆุงุฆูŠุฉ.</code> | <code>ุจู…ุฌุฑุฏ ุฅุตุงุจุชูƒ ุจุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ ุŒ ูŠุณุชุบุฑู‚ ุธู‡ูˆุฑ ุงู„ุนู„ุงู…ุงุช ูˆุงู„ุฃุนุฑุงุถ ู…ู† ุณุจุนุฉ ุฅู„ู‰ ุนุดุฑุฉ ุฃูŠุงู… ุŒ ุนู„ู‰ ุงู„ุฑุบู… ู…ู† ุฃู†ู‡ุง ู‚ุฏ ุชุณุชุบุฑู‚ ูˆู‚ุชู‹ุง ุฃุทูˆู„ ููŠ ุจุนุถ ุงู„ุฃุญูŠุงู†. ุจุนุฏ ุฃุณุจูˆุน ุฃูˆ ุฃุณุจูˆุนูŠู† ุŒ ุณุงุกุช ุงู„ุนู„ุงู…ุงุช ูˆุงู„ุฃุนุฑุงุถ. ูŠุชุฑุงูƒู… ุงู„ู…ุฎุงุท ุงู„ุณู…ูŠูƒ ุฏุงุฎู„ ุงู„ู…ู…ุฑุงุช ุงู„ู‡ูˆุงุฆูŠุฉ ุŒ ู…ู…ุง ูŠุณุจุจ ุณุนุงู„ู‹ุง ู„ุง ูŠู…ูƒู† ุงู„ุณูŠุทุฑุฉ ุนู„ูŠู‡. ูˆู…ุน ุฐู„ูƒ ุŒ ูุฅู† ุงู„ูƒุซูŠุฑ ู…ู† ุงู„ู†ุงุณ ู„ุง ูŠุทูˆุฑูˆู† ู‡ุฐู‡ ุงู„ุฎุงุตูŠุฉ ุงู„ู…ู…ูŠุฒุฉ. ููŠ ุจุนุถ ุงู„ุฃุญูŠุงู† ุŒ ูŠูƒูˆู† ุงู„ุณุนุงู„ ุงู„ู…ุชู‚ุทุน ู‡ูˆ ุงู„ุนู„ุงู…ุฉ ุงู„ูˆุญูŠุฏุฉ ุนู„ู‰ ุฅุตุงุจุฉ ุงู„ู…ุฑุงู‡ู‚ ุฃูˆ ุงู„ุจุงู„ุบ ุจุงู„ุณุนุงู„ ุงู„ุฏูŠูƒูŠ. ู‚ุฏ ู„ุง ูŠุณุนู„ ุงู„ุฃุทูุงู„ ุนู„ู‰ ุงู„ุฅุทู„ุงู‚. ุจุฏู„ุงู‹ ู…ู† ุฐู„ูƒ ุŒ ู‚ุฏ ูŠูƒุงูุญูˆู† ู…ู† ุฃุฌู„ ุงู„ุชู†ูุณ ุŒ ุฃูˆ ู‚ุฏ ูŠุชูˆู‚ููˆู† ู…ุคู‚ุชู‹ุง ุนู† ุงู„ุชู†ูุณ.</code> |
140
- | <code>ู…ุง ู‡ูˆ ู…ุชูˆุณุท โ€‹โ€‹ุถุบุท ุงู„ู…ุงุก ู„ู„ู…ู†ุฒู„</code> | <code>ุถุบุท ุงู„ู…ุงุก ู‡ูˆ ู…ู‚ุฏุงุฑ ุงู„ู‚ูˆุฉ ู…ู† ุงู„ู…ุงุก ุงู„ุฑุฆูŠุณูŠ ุฅู„ู‰ ู…ู†ุฒู„ูƒ. ูŠู‚ุงุณ ุถุบุท ุงู„ู…ุงุก ุจุงู„ุฌู†ูŠู‡ ู„ูƒู„ ุจูˆุตุฉ ู…ุฑุจุนุฉ (PSI) ุŒ ูˆุถุบุท ุงู„ู…ุงุก ุงู„ุนุงุฏูŠ ุนุงุฏุฉ ู…ุง ุจูŠู† 30 ูˆ 80 ุฑุทู„ ู„ูƒู„ ุจูˆุตุฉ ู…ุฑุจุนุฉ ุŒ ุงู„ุชุฏูู‚ ุงู„ูˆุธูŠููŠ ู‡ูˆ ุญุฌู… ุงู„ู…ูŠุงู‡ ุงู„ู…ุชุฏูู‚ุฉ ุนุจุฑ ุงู„ุฃู†ุงุจูŠุจ ุงู„ุฎุงุตุฉ ุจูƒ ูˆุชุตู„ ุฅู„ู‰ ุงู„ุชุฑูƒูŠุจุงุช ุงู„ูุฑุฏูŠุฉ ุŒ ูˆู‡ูˆ ุฌู‡ุงุฒ ุนู„ู‰ ุดูƒู„ ุฌุฑุณ ูŠู‚ู„ู„ ู…ู† ุถุบุท ุงู„ู…ุงุก. ูŠุฌุจ ุฃู† ูŠูƒูˆู† ุถุบุท ุงู„ู…ุงุก 60-70 ุฑุทู„ ู„ูƒู„ ุจูˆุตุฉ ู…ุฑุจุนุฉ. ุฅุฐุง ูƒุงู† ุถุบุท ุงู„ู…ู†ุฒู„ ู…ู†ุฎูุถู‹ุง ุŒ ูุฃู†ุช ุชุฑูŠุฏ ุฃูˆู„ุงู‹ ุชุญุฏูŠุฏ ู…ุง ุฅุฐุง ูƒุงู† ุงู„ู…ู†ุฒู„ ูŠุนู…ู„ ุจู†ุธุงู… ุฅู…ุฏุงุฏ ุงู„ู…ูŠุงู‡ ุงู„ุนุงู… ุฃูˆ ู†ุธุงู… ุงู„ุขุจุงุฑ ุงู„ุฎุงุต.</code> | <code>ุงู„ุถุบุท ุงู„ู…ุญูŠุท ููŠ ุงู„ู…ุงุก ุฐูŠ ุงู„ุณุทุญ ุงู„ุญุฑ ู‡ูˆ ู…ุฒูŠุฌ ู…ู† ุงู„ุถุบุท ุงู„ู‡ูŠุฏุฑูˆุณุชุงุชูŠูƒูŠ ุงู„ู†ุงุชุฌ ุนู† ูˆุฒู† ุนู…ูˆุฏ ุงู„ู…ุงุก ูˆุงู„ุถุบุท ุงู„ุฌูˆูŠ ุนู„ู‰ ุงู„ุณุทุญ ุงู„ุญุฑ ุŒ ูˆุงู„ุถุบุท ุงู„ู…ุญูŠุท ุนู„ู‰ ุงู„ุฌุณู… ู‡ูˆ ุถุบุท ุงู„ูˆุณุท ุงู„ู…ุญูŠุท ุŒ ู…ุซู„ ุงู„ุบุงุฒ ุฃูˆ ุงู„ุณุงุฆู„ ุงู„ุฐูŠ ูŠู„ุงู…ุณ ุงู„ุฌุณู…. ู…ุญุชูˆูŠุงุช.</code> |
141
  * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
142
  ```json
143
  {
@@ -165,19 +193,19 @@ You can finetune this model on your own dataset.
165
  #### Unnamed Dataset
166
 
167
 
168
- * Size: 3,273 evaluation samples
169
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
170
  * Approximate statistics based on the first 1000 samples:
171
- | | anchor | positive | negative |
172
- |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
173
- | type | string | string | string |
174
- | details | <ul><li>min: 4 tokens</li><li>mean: 8.86 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 66.56 tokens</li><li>max: 191 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 69.1 tokens</li><li>max: 198 tokens</li></ul> |
175
  * Samples:
176
- | anchor | positive | negative |
177
- |:-------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
178
- | <code>ู…ุง ู‡ูŠ ุงู„ููˆุงุฆุฏ ุงู„ุตุญูŠุฉ ู„ู„ุฌูˆุฒ</code> | <code>11 ููˆุงุฆุฏ ู„ุง ุชุตุฏู‚ ู„ู„ุฌูˆุฒ. ุชุดู…ู„ ุงู„ููˆุงุฆุฏ ุงู„ุตุญูŠุฉ ู„ู„ุฌูˆุฒ ุงู„ุญุฏ ู…ู† ุงู„ูƒูˆู„ูŠุณุชุฑูˆู„ ุงู„ุณูŠุฆ ููŠ ุงู„ุฌุณู… ุŒ ูˆุชุญุณูŠู† ุงู„ุชู…ุซูŠู„ ุงู„ุบุฐุงุฆูŠ ุŒ ูˆุงู„ุณูŠุทุฑุฉ ุนู„ู‰ ู…ุฑุถ ุงู„ุณูƒุฑูŠ. ุชู†ุจุน ุงู„ููˆุงุฆุฏ ุงู„ุตุญูŠุฉ ุงู„ู…ู‡ู…ุฉ ุงู„ุฃุฎุฑู‰ ู„ู„ุฌูˆุฒ ู…ู† ุญู‚ูŠู‚ุฉ ุฃู† ู‡ุฐู‡ ุงู„ู…ูƒุณุฑุงุช ุชู…ุชู„ูƒ ุฎุตุงุฆุต ู…ุถุงุฏุฉ ู„ู„ุงู„ุชู‡ุงุจุงุช ุŒ ูˆุชุณุงุนุฏ ููŠ ุฅุฏุงุฑุฉ ุงู„ูˆุฒู† ุŒ ูˆุชุณุงุนุฏ ููŠ ุชู‚ูˆูŠุฉ ุงู„ุญุงู„ุฉ ุงู„ู…ุฒุงุฌูŠุฉ.</code> | <code>ู„ุง ูŠุดุชุฑุท ู‚ุงู†ูˆู† ุฅู„ูŠู†ูˆูŠ ุนู„ู‰ ุฃุตุญุงุจ ุงู„ุนู…ู„ ุชู‚ุฏูŠู… ู…ุฒุงูŠุง ุตุญูŠุฉ ู„ู…ูˆุธููŠู‡ู… ุฃูˆ ุนุงุฆู„ุงุชู‡ู…. ูˆู…ุน ุฐู„ูƒ ุŒ ุฅุฐุง ูƒู†ุช ู…ุดู…ูˆู„ุงู‹ ุจุงู„ู…ุฒุงูŠุง ุงู„ุตุญูŠุฉ ู„ุตุงุญุจ ุงู„ุนู…ู„ ุŒ ูู‚ุฏ ูŠูƒูˆู† ูู‚ุฏุงู† ุงู„ุชุบุทูŠุฉ ู…ุฏู…ุฑู‹ุง.</code> |
179
- | <code>ุฃูุถู„ ุนู†ุงูŠุฉ ุจุงู„ุจุดุฑุฉ ู„ู„ุงุญู…ุฑุงุฑ</code> | <code>ุชู… ุชุตู…ูŠู… ุฎุท ุงู„ุนู†ุงูŠุฉ ุจุงู„ุจุดุฑุฉ ู„ุนู„ุงุฌ ุงู„ุงุญู…ุฑุงุฑ ู…ู† ู…ุฑุงุฏ ู„ุชู„ุจูŠุฉ ุงุญุชูŠุงุฌุงุช ุงู„ุนู†ุงูŠุฉ ุจุงู„ุจุดุฑุฉ ู„ู„ุฃูุฑุงุฏ ุฐูˆูŠ ุงู„ุจุดุฑุฉ ุงู„ุญุณุงุณุฉ ุงู„ู…ุนุฑุถุฉ ู„ู„ุงุญู…ุฑุงุฑ ูˆุงู„ุชู‡ูŠุฌ. ูŠุดุชู…ู„ ุงู„ู†ุธุงู… ุงู„ู…ูƒูˆู† ู…ู† ุซู„ุงุซุฉ ุฃุฌุฒุงุก ุนู„ู‰ ู…ู†ุธู ูˆุฌู„ ู…ุนุงู„ุฌ ูˆู…ุฑุทุจ ู…ุตุญุญ ู…ุน ูˆุงู‚ูŠ ู…ู† ุงู„ุดู…ุณ.</code> | <code>ุงู„ุฃูˆุตุงู. ูŠุณุชุฎุฏู… ู‡ูŠุฏุฑูˆูƒูˆุฑุชูŠุฒูˆู† ูุงู„ูŠุฑุงุช ุงู„ู…ูˆุถุนูŠ ู„ู„ู…ุณุงุนุฏุฉ ููŠ ุชุฎููŠู ุงู„ุงุญู…ุฑุงุฑ ุฃูˆ ุงู„ุญูƒุฉ ุฃูˆ ุงู„ุชูˆุฑู… ุฃูˆ ุบูŠุฑ ุฐู„ูƒ ู…ู† ุงู„ุงู†ุฒุนุงุฌ ุงู„ู†ุงุฌู… ุนู† ุงู„ุฃู…ุฑุงุถ ุงู„ุฌู„ุฏูŠุฉ. ู‡ุฐุง ุงู„ุฏูˆุงุก ุนุจุงุฑุฉ ุนู† ูƒูˆุฑุชูŠูƒูˆุณุชูŠุฑูˆูŠุฏ (ุฏูˆุงุก ุดุจูŠู‡ ุจุงู„ูƒูˆุฑุชูŠุฒูˆู† ุฃูˆ ุงู„ุณุชูŠุฑูˆูŠุฏ) ุŒ ูˆู„ุง ูŠุชูˆูุฑ ู‡ุฐุง ุงู„ุฏูˆุงุก ุฅู„ุง ุจูˆุตูุฉ ุทุจูŠุฉ ุŒ ูˆูŠุณุชุฎุฏู… ู‡ูŠุฏุฑูˆูƒูˆุฑุชูŠุฒูˆู† ูุงู„ูŠุฑุงุช ุงู„ู…ูˆุถุนูŠ ู„ู„ู…ุณุงุนุฏุฉ ููŠ ุชุฎููŠู ุงู„ุงุญู…ุฑุงุฑ ุฃูˆ ุงู„ุญูƒุฉ ุฃูˆ ุงู„ุชูˆุฑู… ุฃูˆ ุบูŠุฑ ุฐู„ูƒ ู…ู† ุงู„ุงู†ุฒุนุงุฌ ุงู„ู†ุงุฌู… ุนู† ุงู„ุฃู…ุฑุงุถ ุงู„ุฌู„ุฏูŠุฉ. ู‡ุฐุง ุงู„ุฏูˆุงุก ุนุจุงุฑุฉ ุนู† ูƒูˆุฑุชูŠูƒูˆุณุชูŠุฑูˆูŠุฏ (ุฏูˆุงุก ูŠุดุจู‡ ุงู„ูƒูˆุฑุชูŠุฒูˆู† ุฃูˆ ุงู„ุณุชูŠุฑูˆูŠุฏ).</code> |
180
- | <code>ู…ุชูˆุณุท โ€‹โ€‹ุงู„ุทู‚ุณ ููŠ ู…ูŠู†ูŠุงุจูˆู„ูŠุณ ููŠ ู…ุงูŠูˆ</code> | <code>ู…ุชูˆุณุท โ€‹โ€‹ุญุงู„ุฉ ุงู„ุทู‚ุณ ููŠ ู…ุงูŠูˆ ููŠ ู…ูŠู†ูŠุงุจูˆู„ูŠุณ ู…ูŠู†ูŠุณูˆุชุงุŒ ุงู„ูˆู„ุงูŠุงุช ุงู„ู…ุชุญุฏุฉ. ููŠ ู…ูŠู†ูŠุงุจูˆู„ูŠุณ ุŒ ูŠุชู…ูŠุฒ ุดู‡ุฑ ู…ุงูŠูˆ ุจุงู„ุงุฑุชูุงุน ุงู„ุณุฑูŠุน ู„ุฏุฑุฌุงุช ุงู„ุญุฑุงุฑุฉ ุงู„ูŠูˆู…ูŠุฉ ุงู„ู…ุฑุชูุนุฉ ุŒ ู…ุน ุงุฑุชูุงุน ุฏุฑุฌุงุช ุงู„ุญุฑุงุฑุฉ ุงู„ูŠูˆู…ูŠุฉ ุจู…ู‚ุฏุงุฑ 10 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช ุŒ ู…ู† 64 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช ุฅู„ู‰ 74 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช ุนู„ู‰ ู…ุฏุงุฑ ุดู‡ุฑู‹ุง ุŒ ูˆู†ุงุฏุฑู‹ุง ู…ุง ุชุชุฌุงูˆุฒ 85 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช ุฃูˆ ุชู†ุฎูุถ ุฅู„ู‰ ุฃู‚ู„ ู…ู† 51 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช.</code> | <code>ุจูˆู„ุฏู† ุŒ ุฃุฑูŠุฒูˆู†ุง ุงู„ุทู‚ุณ. ูŠุจู„ุบ ู…ุชูˆุณุท โ€‹โ€‹ุฏุฑุฌุฉ ุญุฑุงุฑุฉ ุจูˆู„ุฏู† 55.67 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช ุŒ ูˆู‡ูˆ ุฃู‚ู„ ุจูƒุซูŠุฑ ู…ู† ู…ุชูˆุณุท โ€‹โ€‹ุฏุฑุฌุฉ ุงู„ุญุฑุงุฑุฉ ููŠ ุฃุฑูŠุฒูˆู†ุง ุงู„ุจุงู„ุบ 65.97 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช ูˆุฃุนู„ู‰ ู…ู† ู…ุชูˆุณุท โ€‹โ€‹ุฏุฑุฌุฉ ุงู„ุญุฑุงุฑุฉ ุงู„ูˆุทู†ูŠุฉ ุงู„ุจุงู„ุบ 54.45 ุฏุฑุฌุฉ ูู‡ุฑู†ู‡ุงูŠุช . ุงู„ุทู‚ุณ ุงู„ุชุงุฑูŠุฎูŠ.</code> |
181
  * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
182
  ```json
183
  {
@@ -207,6 +235,7 @@ You can finetune this model on your own dataset.
207
  - `per_device_train_batch_size`: 16
208
  - `per_device_eval_batch_size`: 16
209
  - `learning_rate`: 2e-05
 
210
  - `warmup_ratio`: 0.1
211
  - `fp16`: True
212
  - `batch_sampler`: no_duplicates
@@ -230,7 +259,7 @@ You can finetune this model on your own dataset.
230
  - `adam_beta2`: 0.999
231
  - `adam_epsilon`: 1e-08
232
  - `max_grad_norm`: 1.0
233
- - `num_train_epochs`: 3
234
  - `max_steps`: -1
235
  - `lr_scheduler_type`: linear
236
  - `lr_scheduler_kwargs`: {}
@@ -327,9 +356,31 @@ You can finetune this model on your own dataset.
327
  </details>
328
 
329
  ### Training Logs
330
- | Epoch | Step | Training Loss | loss |
331
- |:------:|:----:|:-------------:|:------:|
332
- | 1.5974 | 500 | 0.7182 | 0.2672 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
333
 
334
 
335
  ### Framework Versions
 
1
  ---
2
  base_model: aubmindlab/bert-base-arabertv02
3
  datasets: []
4
+ language: []
5
  library_name: sentence-transformers
6
  pipeline_tag: sentence-similarity
7
  tags:
 
9
  - sentence-similarity
10
  - feature-extraction
11
  - generated_from_trainer
12
+ - dataset_size:75000
13
  - loss:MatryoshkaLoss
14
  - loss:MultipleNegativesRankingLoss
15
+ widget:
16
+ - source_sentence: ุฑุฌู„ ูŠู†ุธุฑ ุฅู„ู‰ ู…ุง ูŠุจุฏูˆ ุฃู†ู‡ ู‚ุทุน ู…ู† ุงู„ูˆุฑู‚ ุงู„ู…ู‚ูˆู‰ ู„ุงู…ุฑุฃุฉ ููŠ ุงู„ู…ุทุจุฎ.
17
+ sentences:
18
+ - ุฒูˆุฌ ูˆุฒูˆุฌุชู‡ ูŠุชุฒู„ุฌุงู† ุนู„ู‰ ุงู„ุฌุจุงู„ ุงู„ุณูˆูŠุณุฑูŠุฉ
19
+ - ู…ุง ู‡ูˆ ุงู„ูƒุชุงุจ ุงู„ุฌูŠุฏ ู„ู„ู‚ุฑุงุกุฉุŸ
20
+ - ุฑุฌู„ ูŠุญุฏู‚ ููŠ ุงู…ุฑุฃุฉ ููŠ ุงู„ู…ุทุจุฎ
21
+ - source_sentence: ุงู„ูƒู„ุจ ุงู„ุฑู…ุงุฏูŠ ูŠุฑูƒุถ ุนู„ู‰ ุฌุงู†ุจ ุจุฑูƒุฉ ุจูŠู†ู…ุง ุงู„ูƒู„ุจ ุงู„ุฃุตูุฑ ูŠู‚ูุฒ ุฅู„ู‰ ุงู„ุจุฑูƒุฉ.
22
+ sentences:
23
+ - ุงู„ูƒู„ุงุจ ุชุฃูƒู„ ุนุดุงุฆู‡ุง ุงู„ู„ูŠู„ูŠ
24
+ - ู‡ู†ุงูƒ ูƒู„ุจุงู† ุจุงู„ุฎุงุฑุฌ ุจุงู„ู‚ุฑุจ ู…ู† ุญู…ุงู… ุงู„ุณุจุงุญุฉ
25
+ - ูƒูŠู ุชุตู†ุน ุฒุฌุงุฌ ุจูŠุฑูŠูƒุณุŸ
26
+ - source_sentence: ูƒูŠู ูŠู…ูƒู†ู†ุง ูƒุณุจ ุงู„ู…ุงู„ ู…ู† ูŠูˆุชูŠูˆุจุŸ
27
+ sentences:
28
+ - ูƒูŠู ูŠู…ูƒู†ู†ูŠ ูƒุณุจ ุงู„ู…ุงู„ ู…ู† ุฎู„ุงู„ ุงู„ูŠูˆุชูŠูˆุจุŸ
29
+ - ูุชู‰ ูŠุฑู…ูŠ ุญู‚ูŠุจุฉ.
30
+ - ู‡ู„ ูŠู…ูƒู† ู„ุดุฎุต ู…ุชุญูˆู„ ุฌู†ุณูŠุงู‹ ุฃู† ูŠุนูˆุฏ ุฅู„ู‰ ุฌู†ุณู‡ ุงู„ุณุงุจู‚ ุจุนุฏ ุฌุฑุงุญุฉ ุชุบูŠูŠุฑ ุงู„ุฌู†ุณุŸ
31
+ - source_sentence: ูƒูŠู ูŠุญุตู„ ุงู„ู…ุฑุก ุนู„ู‰ ุฑู‚ู… ู‡ุงุชู ูุชุงุฉ ุจุณุฑุนุฉุŸ
32
+ sentences:
33
+ - ุงู…ุฑุฃุฉ ุชุชุณูˆู‚ ููŠ ุณูˆู‚ ุงู„ู…ุฒุงุฑุนูŠู†
34
+ - ูƒูŠู ุชุญุตู„ ุนู„ู‰ ุฑู‚ู… ู‡ุงุชู ูุชุงุฉุŸ
35
+ - ูƒูŠู ูŠู…ูƒู†ู†ูŠ ุงู„ุชุฎู„ุต ู…ู† ุญุจ ุงู„ุดุจุงุจุŸ
36
+ - source_sentence: ู…ุง ู‡ูˆ ู†ูˆุน ุงู„ุฏู‡ูˆู† ุงู„ู…ูˆุฌูˆุฏุฉ ููŠ ุงู„ุฃููˆูƒุงุฏูˆ
37
+ sentences:
38
+ - ุญูˆุงู„ูŠ 15 ููŠ ุงู„ู…ุงุฆุฉ ู…ู† ุงู„ุฏู‡ูˆู† ููŠ ุงู„ุฃููˆูƒุงุฏูˆ ู…ุดุจุนุฉ ุŒ ู…ุน ูƒู„ ูƒูˆุจ ูˆุงุญุฏ ู…ู† ุงู„ุฃููˆูƒุงุฏูˆ
39
+ ุงู„ู…ูุฑูˆู… ูŠุญุชูˆูŠ ุนู„ู‰ 3.2 ุฌุฑุงู… ู…ู† ุงู„ุฏู‡ูˆู† ุงู„ู…ุดุจุนุฉ ุŒ ูˆู‡ูˆ ู…ุง ูŠู…ุซู„ 16 ููŠ ุงู„ู…ุงุฆุฉ ู…ู† DV
40
+ ุงู„ุจุงู„ุบ 20 ุฌุฑุงู…ู‹ุง. ุชุญุชูˆูŠ ุงู„ุฃููˆูƒุงุฏูˆ ููŠ ุงู„ุบุงู„ุจ ุนู„ู‰ ุฏู‡ูˆู† ุฃุญุงุฏูŠุฉ ุบูŠุฑ ู…ุดุจุนุฉ ุŒ ู…ุน 67
41
+ ููŠ ุงู„ู…ุงุฆุฉ ู…ู† ุฅุฌู…ุงู„ูŠ ุงู„ุฏู‡ูˆู† ุŒ ุฃูˆ 14.7 ุฌุฑุงู…ู‹ุง ู„ูƒู„ ูƒูˆุจ ู…ูุฑูˆู… ุŒ ูˆูŠุชูƒูˆู† ู…ู† ู‡ุฐุง ุงู„ู†ูˆุน
42
+ ู…ู† ุงู„ุฏู‡ูˆู†.
43
+ - ุงู…ุฑุฃุฉ ุชุณุชู…ุชุน ุจุฑุงุฆุญุฉ ุดุงูŠู‡ุง ููŠ ุงู„ู‡ูˆุงุก ุงู„ุทู„ู‚.
44
+ - ูŠู…ูƒู† ุฃู† ูŠุคุฏูŠ ุงุฑุชูุงุน ู…ุณุชูˆู‰ ุงู„ุฏู‡ูˆู† ุงู„ุซู„ุงุซูŠุฉ ุŒ ูˆู‡ูŠ ู†ูˆุน ู…ู† ุงู„ุฏู‡ูˆู† (ุงู„ุฏู‡ูˆู†) ููŠ ุงู„ุฏู…
45
+ ุŒ ุฅู„ู‰ ุฒูŠุงุฏุฉ ุฎุทุฑ ุงู„ุฅุตุงุจุฉ ุจุฃู…ุฑุงุถ ุงู„ู‚ู„ุจ ุŒ ูˆูŠู…ูƒู† ุฃู† ูŠุคุฏูŠ ุชูˆููŠุฑ ู…ุณุชูˆู‰ ู…ุฑุชูุน ู…ู† ุงู„ุฏู‡ูˆู†
46
+ ุงู„ุซู„ุงุซูŠุฉ ุŒ ูˆู‡ูŠ ู†ูˆุน ู…ู† ุงู„ุฏู‡ูˆู† (ุงู„ุฏู‡ูˆู†) ููŠ ุงู„ุฏู… ุŒ ุฅู„ู‰ ุฒูŠุงุฏุฉ ุฎุทุฑ ุงู„ุฅุตุงุจุฉ ุจุฃู…ุฑุงุถ ุงู„ู‚ู„ุจ.
47
+ ู…ุฑุถ.
48
  ---
49
 
50
+ # SentenceTransformer based on aubmindlab/bert-base-arabertv02
51
 
52
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 
 
 
 
 
53
 
54
  ## Model Details
55
 
 
96
  model = SentenceTransformer("sentence_transformers_model_id")
97
  # Run inference
98
  sentences = [
99
+ 'ู…ุง ู‡ูˆ ู†ูˆุน ุงู„ุฏู‡ูˆู† ุงู„ู…ูˆุฌูˆุฏุฉ ููŠ ุงู„ุฃููˆูƒุงุฏูˆ',
100
+ 'ุญูˆุงู„ูŠ 15 ููŠ ุงู„ู…ุงุฆุฉ ู…ู† ุงู„ุฏู‡ูˆู† ููŠ ุงู„ุฃููˆูƒุงุฏูˆ ู…ุดุจุนุฉ ุŒ ู…ุน ูƒู„ ูƒูˆุจ ูˆุงุญุฏ ู…ู† ุงู„ุฃููˆูƒุงุฏูˆ ุงู„ู…ูุฑูˆู… ูŠุญุชูˆูŠ ุนู„ู‰ 3.2 ุฌุฑุงู… ู…ู† ุงู„ุฏู‡ูˆู† ุงู„ู…ุดุจุนุฉ ุŒ ูˆู‡ูˆ ู…ุง ูŠู…ุซู„ 16 ููŠ ุงู„ู…ุงุฆุฉ ู…ู† DV ุงู„ุจุงู„ุบ 20 ุฌุฑุงู…ู‹ุง. ุชุญุชูˆูŠ ุงู„ุฃููˆูƒุงุฏูˆ ููŠ ุงู„ุบุงู„ุจ ุนู„ู‰ ุฏู‡ูˆู† ุฃุญุงุฏูŠุฉ ุบูŠุฑ ู…ุดุจุนุฉ ุŒ ู…ุน 67 ููŠ ุงู„ู…ุงุฆุฉ ู…ู† ุฅุฌู…ุงู„ูŠ ุงู„ุฏู‡ูˆู† ุŒ ุฃูˆ 14.7 ุฌุฑุงู…ู‹ุง ู„ูƒู„ ูƒูˆุจ ู…ูุฑูˆู… ุŒ ูˆูŠุชูƒูˆู† ู…ู† ู‡ุฐุง ุงู„ู†ูˆุน ู…ู† ุงู„ุฏู‡ูˆู†.',
101
+ 'ูŠู…ูƒู† ุฃู† ูŠุคุฏูŠ ุงุฑุชูุงุน ู…ุณุชูˆู‰ ุงู„ุฏู‡ูˆู† ุงู„ุซู„ุงุซูŠุฉ ุŒ ูˆู‡ูŠ ู†ูˆุน ู…ู† ุงู„ุฏู‡ูˆู† (ุงู„ุฏู‡ูˆู†) ููŠ ุงู„ุฏู… ุŒ ุฅู„ู‰ ุฒูŠุงุฏุฉ ุฎุทุฑ ุงู„ุฅุตุงุจุฉ ุจุฃู…ุฑุงุถ ุงู„ู‚ู„ุจ ุŒ ูˆูŠู…ูƒู† ุฃู† ูŠุคุฏูŠ ุชูˆููŠุฑ ู…ุณุชูˆู‰ ู…ุฑุชูุน ู…ู† ุงู„ุฏู‡ูˆู† ุงู„ุซู„ุงุซูŠุฉ ุŒ ูˆู‡ูŠ ู†ูˆุน ู…ู† ุงู„ุฏู‡ูˆู† (ุงู„ุฏู‡ูˆู†) ููŠ ุงู„ุฏู… ุŒ ุฅู„ู‰ ุฒูŠุงุฏุฉ ุฎุทุฑ ุงู„ุฅุตุงุจุฉ ุจุฃู…ุฑุงุถ ุงู„ู‚ู„ุจ. ู…ุฑุถ.',
102
  ]
103
  embeddings = model.encode(sentences)
104
  print(embeddings.shape)
 
153
  #### Unnamed Dataset
154
 
155
 
156
+ * Size: 75,000 training samples
157
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
158
  * Approximate statistics based on the first 1000 samples:
159
+ | | anchor | positive | negative |
160
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
161
+ | type | string | string | string |
162
+ | details | <ul><li>min: 4 tokens</li><li>mean: 12.88 tokens</li><li>max: 58 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 13.74 tokens</li><li>max: 126 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 13.38 tokens</li><li>max: 146 tokens</li></ul> |
163
  * Samples:
164
+ | anchor | positive | negative |
165
+ |:------------------------------------------------------------------------------------------|:--------------------------------------------------------------|:--------------------------------------------------|
166
+ | <code>ู‡ู„ ุชุดุงุฌุฑ (ุณูŠ ุฅุณ ู„ูˆูŠุณ) ูˆ (ุฌูŠ ุขุฑ ุขุฑ ุชูˆู„ูƒูŠู†) ุŸ ุฅู† ูƒุงู† ุงู„ุฃู…ุฑ ูƒุฐู„ูƒุŒ ูู…ุง ู‡ูˆ ุงู„ุณุจุจุŸ</code> | <code>ู‡ู„ ุตุญูŠุญ ุฃู† (ุณูŠ ุฅุณ ู„ูˆูŠุณ) ูˆ (ุชูˆู„ูƒูŠู†) ุชุดุงุฌุฑุงุŸ</code> | <code>ู…ุง ู‡ูŠ ุฃูุถู„ ุงู„ูƒุชุจ ู„ู„ุฏุฑุงุณุฉ ููŠ ุงู„ุฌุงู…ุนุฉุŸ</code> |
167
+ | <code>ู…ุง ู‡ูŠ ุงุนุฑุงุถ ูู‚ุฑ ุงู„ุฏู…ุŸ</code> | <code>ู…ุง ู‡ูŠ ุงุนุฑุงุถ ุงู„ุงู†ูŠู…ูŠุงุŸ</code> | <code>ูƒูŠู ุงุญุถุฑ ูƒูŠูƒุฉ ุงู„ุนุณู„ุŸ</code> |
168
+ | <code>ู…ู† ุณุชุตูˆุช ู„ู‡ุŒ ุฏูˆู†ุงู„ุฏ ุชุฑุงู…ุจ ุฃู… ู‡ูŠู„ุงุฑูŠ ูƒู„ูŠู†ุชูˆู†ุŸ</code> | <code>ู‡ู„ ุชุคูŠุฏูˆู† ุฏูˆู†ุงู„ุฏ ุชุฑุงู…ุจ ุฃู… ู‡ูŠู„ุงุฑูŠ ูƒู„ูŠู†ุชูˆู†ุŸ ู„ู…ุงุฐุงุŸ</code> | <code>ูƒูŠู ุฃุชุบู„ุจ ุนู„ู‰ ุฅุฏู…ุงู† ุงู„ู…ูˆุงุฏ ุงู„ุฅุจุงุญูŠุฉุŸ</code> |
169
  * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
170
  ```json
171
  {
 
193
  #### Unnamed Dataset
194
 
195
 
196
+ * Size: 25,000 evaluation samples
197
  * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
198
  * Approximate statistics based on the first 1000 samples:
199
+ | | anchor | positive | negative |
200
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
201
+ | type | string | string | string |
202
+ | details | <ul><li>min: 4 tokens</li><li>mean: 12.6 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.82 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 13.78 tokens</li><li>max: 128 tokens</li></ul> |
203
  * Samples:
204
+ | anchor | positive | negative |
205
+ |:-----------------------------------------------------------|:-------------------------------------------------------------|:--------------------------------------------|
206
+ | <code>ู†ุนู… , ู†ุนู… , ุฃูˆ ุฑุฃูŠุช " ุชุดูŠู…ุง ุจุงุฑุง ุฏูŠุณูˆ "</code> | <code>ู†ุนู…ุŒ ุฃูˆ "ุชุดูŠู…ุง ุจุงุฑุง ุฏูŠุณูˆ" ูƒุงู†ุช ุชู„ูƒ ุงู„ุชูŠ ุดุงู‡ุฏุชู‡ุง</code> | <code>ุฃู†ุง ู„ู… ุฃุฑู‰ "ุชุดูŠู…ุง ุจุงุฑุง ุฏูŠุณูˆ".</code> |
207
+ | <code>ุฑุฌู„ ูˆุงู…ุฑุฃุฉ ูŠุฌู„ุณุงู† ุนู„ู‰ ุงู„ุดุงุทุฆ ุจูŠู†ู…ุง ุชุบุฑุจ ุงู„ุดู…ุณ</code> | <code>ู‡ู†ุงูƒ ุฑุฌู„ ูˆุงู…ุฑุฃุฉ ูŠุฌู„ุณุงู† ุนู„ู‰ ุงู„ุดุงุทุฆ</code> | <code>ุฅู†ู‡ู… ูŠุดุงู‡ุฏูˆู† ุดุฑูˆู‚ ุงู„ุดู…ุณ</code> |
208
+ | <code>ูƒูŠู ุฃุณูŠุทุฑ ุนู„ู‰ ุบุถุจูŠุŸ</code> | <code>ู…ุง ู‡ูŠ ุฃูุถู„ ุทุฑูŠู‚ุฉ ู„ู„ุณูŠุทุฑุฉ ุนู„ู‰ ุงู„ุบุถุจุŸ</code> | <code>ูƒูŠู ุฃุนุฑู ุฅู† ูƒุงู†ุช ุฒูˆุฌุชูŠ ุชุฎูˆู†ู†ูŠุŸ</code> |
209
  * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
210
  ```json
211
  {
 
235
  - `per_device_train_batch_size`: 16
236
  - `per_device_eval_batch_size`: 16
237
  - `learning_rate`: 2e-05
238
+ - `num_train_epochs`: 5
239
  - `warmup_ratio`: 0.1
240
  - `fp16`: True
241
  - `batch_sampler`: no_duplicates
 
259
  - `adam_beta2`: 0.999
260
  - `adam_epsilon`: 1e-08
261
  - `max_grad_norm`: 1.0
262
+ - `num_train_epochs`: 5
263
  - `max_steps`: -1
264
  - `lr_scheduler_type`: linear
265
  - `lr_scheduler_kwargs`: {}
 
356
  </details>
357
 
358
  ### Training Logs
359
+ | Epoch | Step | Training Loss | loss |
360
+ |:------:|:-----:|:-------------:|:------:|
361
+ | 0.2133 | 500 | 1.4163 | 0.3134 |
362
+ | 0.4266 | 1000 | 0.3306 | 0.1912 |
363
+ | 0.6399 | 1500 | 0.2263 | 0.1527 |
364
+ | 0.8532 | 2000 | 0.1818 | 0.1297 |
365
+ | 1.0666 | 2500 | 0.1658 | 0.1167 |
366
+ | 1.2799 | 3000 | 0.1139 | 0.1040 |
367
+ | 1.4932 | 3500 | 0.0808 | 0.1018 |
368
+ | 1.7065 | 4000 | 0.0692 | 0.0959 |
369
+ | 1.9198 | 4500 | 0.058 | 0.0958 |
370
+ | 2.1331 | 5000 | 0.0653 | 0.0882 |
371
+ | 2.3464 | 5500 | 0.0503 | 0.0912 |
372
+ | 2.5597 | 6000 | 0.0338 | 0.0970 |
373
+ | 2.7730 | 6500 | 0.0363 | 0.0906 |
374
+ | 2.9863 | 7000 | 0.0375 | 0.0856 |
375
+ | 3.1997 | 7500 | 0.0401 | 0.0879 |
376
+ | 3.4130 | 8000 | 0.031 | 0.0848 |
377
+ | 3.6263 | 8500 | 0.0255 | 0.0938 |
378
+ | 3.8396 | 9000 | 0.0239 | 0.0858 |
379
+ | 4.0529 | 9500 | 0.0305 | 0.0840 |
380
+ | 4.2662 | 10000 | 0.0281 | 0.0833 |
381
+ | 4.4795 | 10500 | 0.0174 | 0.0840 |
382
+ | 4.6928 | 11000 | 0.0216 | 0.0882 |
383
+ | 4.9061 | 11500 | 0.022 | 0.0866 |
384
 
385
 
386
  ### Framework Versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a2f484329790d7d4196c0abbdef27adc40316af55d3aecc2b9a249dece8ef6b9
3
  size 540795752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee308a99b75411cbc36588efb0b0a39c698668b9d5a9cdf2afd8fcd82bdb2f44
3
  size 540795752