Edit model card

SentenceTransformer based on ahdsoft/persian-sentence-transformer-news-wiki-pairs-v3

This is a sentence-transformers model finetuned from ahdsoft/persian-sentence-transformer-news-wiki-pairs-v3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aidal/persian-sentence-transformer-product-classification")
# Run inference
sentences = [
    'بازی آموزشی مدل جورچین ایران کد K-5',
    'اسباب بازی، کودک و نوزاد',
    'زیبایی و سلامت',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine nan
spearman_cosine nan
pearson_manhattan nan
spearman_manhattan nan
pearson_euclidean nan
spearman_euclidean nan
pearson_dot nan
spearman_dot nan
pearson_max nan
spearman_max nan

Training Details

Training Dataset

Unnamed Dataset

  • Size: 804,708 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 16.15 tokens
    • max: 41 tokens
    • min: 3 tokens
    • mean: 6.42 tokens
    • max: 11 tokens
  • Samples:
    anchor positive
    مربا زرشک مارجان - 270 گرم محصولات بومی و محلی
    دفتر یادداشت بادکنک آبی طرح انیمه مدل Attack on titan مجموعه 2 عددی کتاب، لوازم تحریر و هنر
    چای ساز کاراجا مدل Cay Sever لوازم خانگی برقی
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 89,413 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 16.02 tokens
    • max: 44 tokens
    • min: 3 tokens
    • mean: 6.38 tokens
    • max: 11 tokens
  • Samples:
    anchor positive
    لامپ ال ای دی 6 وات لداستار مدل شعله ای پایه E27 بسته 3 عددی خانه و آشپزخانه
    زیرانداز تعویض نوزاد مدل هپی ویکند اسباب بازی، کودک و نوزاد
    تابلو نوری کاکتی مدل عاشقانه طرح اسم شهسوار کد TA14352 خانه و آشپزخانه
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • log_level: debug
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: debug
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss embedding-similarity-eval_spearman_max
0.0040 100 2.7527 - -
0.0080 200 2.0773 - -
0.0119 300 1.764 - -
0.0159 400 1.5861 - -
0.0199 500 1.5138 - -
0.0239 600 1.4307 - -
0.0278 700 1.3923 - -
0.0318 800 1.3251 - -
0.0358 900 1.3023 - -
0.0398 1000 1.2929 - -
0.0437 1100 1.2764 - -
0.0477 1200 1.2728 - -
0.0517 1300 1.2262 - -
0.0557 1400 1.2456 - -
0.0596 1500 1.2052 - -
0.0636 1600 1.1912 - -
0.0676 1700 1.2077 - -
0.0716 1800 1.2196 - -
0.0756 1900 1.1603 - -
0.0795 2000 1.1706 - -
0.0835 2100 1.2001 - -
0.0875 2200 1.1822 - -
0.0915 2300 1.1703 - -
0.0954 2400 1.204 - -
0.0994 2500 1.1863 1.1333 nan
0.1034 2600 1.1567 - -
0.1074 2700 1.1876 - -
0.1113 2800 1.1553 - -
0.1153 2900 1.1332 - -
0.1193 3000 1.1426 - -
0.1233 3100 1.1476 - -
0.1272 3200 1.1482 - -
0.1312 3300 1.1343 - -
0.1352 3400 1.1572 - -
0.1392 3500 1.1018 - -
0.1432 3600 1.1175 - -
0.1471 3700 1.1024 - -
0.1511 3800 1.1308 - -
0.1551 3900 1.1386 - -
0.1591 4000 1.1103 - -
0.1630 4100 1.1472 - -
0.1670 4200 1.1079 - -
0.1710 4300 1.1199 - -
0.1750 4400 1.1306 - -
0.1789 4500 1.0975 - -
0.1829 4600 1.1285 - -
0.1869 4700 1.121 - -
0.1909 4800 1.1099 - -
0.1948 4900 1.0913 - -
0.1988 5000 1.0631 1.0980 nan
0.2028 5100 1.1336 - -
0.2068 5200 1.1055 - -
0.2108 5300 1.0987 - -
0.2147 5400 1.1078 - -
0.2187 5500 1.0749 - -
0.2227 5600 1.1016 - -
0.2267 5700 1.0768 - -
0.2306 5800 1.0954 - -
0.2346 5900 1.0975 - -
0.2386 6000 1.0638 - -
0.2426 6100 1.0751 - -
0.2465 6200 1.0675 - -
0.2505 6300 1.0513 - -
0.2545 6400 1.0808 - -
0.2585 6500 1.0863 - -
0.2624 6600 1.0681 - -
0.2664 6700 1.0813 - -
0.2704 6800 1.077 - -
0.2744 6900 1.0811 - -
0.2784 7000 1.0543 - -
0.2823 7100 1.0677 - -
0.2863 7200 1.0691 - -
0.2903 7300 1.0597 - -
0.2943 7400 1.0538 - -
0.2982 7500 1.0853 1.0658 nan
0.3022 7600 1.0831 - -
0.3062 7700 1.0565 - -
0.3102 7800 1.0667 - -
0.3141 7900 1.0839 - -
0.3181 8000 1.0742 - -
0.3221 8100 1.0543 - -
0.3261 8200 1.0539 - -
0.3300 8300 1.07 - -
0.3340 8400 1.0556 - -
0.3380 8500 1.0715 - -
0.3420 8600 1.0468 - -
0.3460 8700 1.0477 - -
0.3499 8800 1.0401 - -
0.3539 8900 1.1047 - -
0.3579 9000 1.0345 - -
0.3619 9100 1.0677 - -
0.3658 9200 1.0705 - -
0.3698 9300 1.0624 - -
0.3738 9400 1.0528 - -
0.3778 9500 1.0455 - -
0.3817 9600 1.0555 - -
0.3857 9700 1.0338 - -
0.3897 9800 1.0624 - -
0.3937 9900 1.0645 - -
0.3976 10000 1.0622 1.0430 nan
0.4016 10100 1.0523 - -
0.4056 10200 1.0697 - -
0.4096 10300 1.0733 - -
0.4136 10400 1.0415 - -
0.4175 10500 1.0644 - -
0.4215 10600 1.0404 - -
0.4255 10700 1.026 - -
0.4295 10800 1.0408 - -
0.4334 10900 1.0602 - -
0.4374 11000 1.0538 - -
0.4414 11100 1.0396 - -
0.4454 11200 1.0852 - -
0.4493 11300 1.0412 - -
0.4533 11400 1.0249 - -
0.4573 11500 1.024 - -
0.4613 11600 1.0494 - -
0.4652 11700 1.0461 - -
0.4692 11800 1.027 - -
0.4732 11900 1.0802 - -
0.4772 12000 1.0402 - -
0.4812 12100 1.026 - -
0.4851 12200 1.0565 - -
0.4891 12300 1.0416 - -
0.4931 12400 1.0452 - -
0.4971 12500 1.0425 1.0376 nan
0.5010 12600 1.0319 - -
0.5050 12700 1.0422 - -
0.5090 12800 1.0261 - -
0.5130 12900 1.0498 - -
0.5169 13000 1.0189 - -
0.5209 13100 1.0309 - -
0.5249 13200 1.0509 - -
0.5289 13300 1.0524 - -
0.5328 13400 1.0516 - -
0.5368 13500 1.0104 - -
0.5408 13600 1.0394 - -
0.5448 13700 1.0473 - -
0.5488 13800 1.0151 - -
0.5527 13900 1.0379 - -
0.5567 14000 1.0556 - -
0.5607 14100 1.0465 - -
0.5647 14200 1.046 - -
0.5686 14300 1.0211 - -
0.5726 14400 1.0234 - -
0.5766 14500 1.0215 - -
0.5806 14600 1.0445 - -
0.5845 14700 1.0229 - -
0.5885 14800 1.0383 - -
0.5925 14900 1.0491 - -
0.5965 15000 1.0425 1.0303 nan
0.6004 15100 1.052 - -
0.6044 15200 1.0281 - -
0.6084 15300 1.0288 - -
0.6124 15400 1.0096 - -
0.6164 15500 1.0447 - -
0.6203 15600 1.038 - -
0.6243 15700 1.0061 - -
0.6283 15800 1.0255 - -
0.6323 15900 1.0246 - -
0.6362 16000 1.0255 - -
0.6402 16100 1.0271 - -
0.6442 16200 1.0163 - -
0.6482 16300 1.0381 - -
0.6521 16400 1.0333 - -
0.6561 16500 1.0161 - -
0.6601 16600 1.03 - -
0.6641 16700 1.0299 - -
0.6680 16800 1.0191 - -
0.6720 16900 1.0268 - -
0.6760 17000 1.0177 - -
0.6800 17100 1.0157 - -
0.6840 17200 1.0382 - -
0.6879 17300 1.0306 - -
0.6919 17400 1.0231 - -
0.6959 17500 1.0456 1.0231 nan
0.6999 17600 0.9993 - -
0.7038 17700 1.0212 - -
0.7078 17800 1.0114 - -
0.7118 17900 1.0169 - -
0.7158 18000 1.0115 - -
0.7197 18100 1.019 - -
0.7237 18200 1.016 - -
0.7277 18300 1.0252 - -
0.7317 18400 1.0374 - -
0.7356 18500 1.0147 - -
0.7396 18600 1.0302 - -
0.7436 18700 1.0203 - -
0.7476 18800 1.0395 - -
0.7516 18900 1.0486 - -
0.7555 19000 1.0321 - -
0.7595 19100 1.0463 - -
0.7635 19200 1.0124 - -
0.7675 19300 1.0026 - -
0.7714 19400 1.0474 - -
0.7754 19500 1.0314 - -
0.7794 19600 1.0183 - -
0.7834 19700 1.0067 - -
0.7873 19800 1.0179 - -
0.7913 19900 1.0388 - -
0.7953 20000 1.0063 1.0157 nan
0.7993 20100 1.0175 - -
0.8032 20200 1.0349 - -
0.8072 20300 1.0125 - -
0.8112 20400 0.9982 - -
0.8152 20500 1.0428 - -
0.8192 20600 1.0526 - -
0.8231 20700 1.0424 - -
0.8271 20800 1.008 - -
0.8311 20900 1.0186 - -
0.8351 21000 1.0256 - -
0.8390 21100 1.0125 - -
0.8430 21200 1.0286 - -
0.8470 21300 1.0358 - -
0.8510 21400 1.0189 - -
0.8549 21500 0.9861 - -
0.8589 21600 0.9934 - -
0.8629 21700 1.0211 - -
0.8669 21800 1.0221 - -
0.8708 21900 1.0302 - -
0.8748 22000 1.0145 - -
0.8788 22100 1.0027 - -
0.8828 22200 1.0084 - -
0.8868 22300 1.0334 - -
0.8907 22400 1.0025 - -
0.8947 22500 1.0175 1.0102 nan
0.8987 22600 1.0 - -
0.9027 22700 1.0268 - -
0.9066 22800 0.9795 - -
0.9106 22900 1.0071 - -
0.9146 23000 1.0141 - -
0.9186 23100 1.006 - -
0.9225 23200 1.0327 - -
0.9265 23300 1.0016 - -
0.9305 23400 1.0313 - -
0.9345 23500 1.021 - -
0.9384 23600 1.0217 - -
0.9424 23700 1.0191 - -
0.9464 23800 1.0238 - -
0.9504 23900 1.0469 - -
0.9544 24000 1.0338 - -
0.9583 24100 1.0043 - -
0.9623 24200 1.0054 - -
0.9663 24300 1.0264 - -
0.9703 24400 1.024 - -
0.9742 24500 1.0172 - -
0.9782 24600 1.0127 - -
0.9822 24700 1.013 - -
0.9862 24800 1.0135 - -
0.9901 24900 1.0145 - -
0.9941 25000 1.0184 1.0082 nan
0.9981 25100 1.0305 - -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.43.3
  • PyTorch: 2.2.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for aidal/persian-sentence-transformer-product-classification

Evaluation results