SentenceTransformer based on sentence-transformers/quora-distilbert-multilingual

This is a sentence-transformers model finetuned from sentence-transformers/quora-distilbert-multilingual. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("melino2000/product-torob-matching")
# Run inference
sentences = [
    'رایزر گرافیک مدل 009s plus هشت خازنه',
    'رایزر گرافیک تبدیل PCI EXPRESS X1 به X16 مدل 009S',
    'شامپو کودک حاوی عصاره اسطوخودوس فیروز200 میل',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9908
cosine_accuracy_threshold 0.7373
cosine_f1 0.9908
cosine_f1_threshold 0.7295
cosine_precision 0.99
cosine_recall 0.9915
cosine_ap 0.9989
cosine_mcc 0.9815

Training Details

Training Dataset

Unnamed Dataset

  • Size: 32,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 19.0 tokens
    • max: 68 tokens
    • min: 5 tokens
    • mean: 19.06 tokens
    • max: 56 tokens
    • 0: ~49.40%
    • 1: ~50.60%
  • Samples:
    sentence1 sentence2 label
    پرینتر چندکاره لیزری HP LaserJet Pro M130a پرینتر لیزری سه کاره اچ پی HP M130a 1
    قرص روکشدار مولتی دیلی دکتر گیل 60 عددی داروسازی رازان فارمدیان قرص مولتی دیلی دکتر گیل 1
    خمیردندان کلگیت 3 کاره Triple Action 100 میل خمیر دندان کولگیت مدل 3 کاره حجم 100 میل
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 8,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 5 tokens
    • mean: 19.24 tokens
    • max: 69 tokens
    • min: 2 tokens
    • mean: 18.76 tokens
    • max: 56 tokens
    • 0: ~50.00%
    • 1: ~50.00%
  • Samples:
    sentence1 sentence2 label
    مایکرو فر 36 لیتری ناسا الکتریک مدل NS-2024 سرویس کاور روتختی تک نفره ایکیا مدل Ikea BRUNKRISSLA 404.907.23 0
    کنسول بازی نینتندو سوییچ سفید - Nintendo Switch OLED Model white NINTENDO SWITCH OLED (Neon Red & Neon Blue) 1
    خمیر دندان کرست مدل Complete 7 قلمو سرگرد 2122 پارس آرت (32400_107700 تومان) 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss product-matching-binary_cosine_ap
-1 -1 - - 0.9436
0.1 100 0.637 - -
0.2 200 0.1303 - -
0.25 250 - 0.0785 0.9961
0.3 300 0.1378 - -
0.4 400 0.1191 - -
0.5 500 0.0949 0.0723 0.9963
0.6 600 0.1016 - -
0.7 700 0.0694 - -
0.75 750 - 0.0464 0.9974
0.8 800 0.0619 - -
0.9 900 0.0543 - -
1.0 1000 0.0658 0.0394 0.9981
1.1 1100 0.0326 - -
1.2 1200 0.0176 - -
1.25 1250 - 0.0387 0.9980
1.3 1300 0.0237 - -
1.4 1400 0.0219 - -
1.5 1500 0.0115 0.0259 0.9983
1.6 1600 0.0218 - -
1.7 1700 0.0235 - -
1.75 1750 - 0.0230 0.9988
1.8 1800 0.0319 - -
1.9 1900 0.0127 - -
2.0 2000 0.015 0.0285 0.9987
2.099 2100 0.0121 - -
2.199 2200 0.0091 - -
2.249 2250 - 0.0217 0.9986
2.299 2300 0.0107 - -
2.399 2400 0.009 - -
2.499 2500 0.0043 0.0224 0.9989
2.599 2600 0.0028 - -
2.699 2700 0.0026 - -
2.749 2750 - 0.0248 0.9989
2.799 2800 0.0024 - -
2.899 2900 0.0067 - -
2.999 3000 0.0088 0.0225 0.9989
-1 -1 - - 0.9989

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
11
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for melino2000/product-torob-matching

Finetuned
(3)
this model

Evaluation results