SentenceTransformer based on distilbert/distilbert-base-multilingual-cased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-multilingual-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pritamdeka/distilbert-base-multilingual-cased-indicxnli-random-negatives-v1")
# Run inference
sentences = [
    'মই ভালদৰে জানিব নোৱাৰোঁ আপোনালোকৰ সৈতে কথা বতৰা আৰু এক ভাল সন্ধ্যা আছিল',
    'মই নিশ্চিত নহয় কিন্তু মই অলপ ভাল, আজি ৰাতি আপোনালোকৰ সৈতে কথা পাতিবলৈ পাই ভাল লাগিল।',
    'Shannon এ বাৰ্তা উপেক্ষা কৰিছে।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.717
spearman_cosine 0.7221
pearson_manhattan 0.738
spearman_manhattan 0.7452
pearson_euclidean 0.7387
spearman_euclidean 0.7459
pearson_dot 0.6481
spearman_dot 0.6478
pearson_max 0.7387
spearman_max 0.7459

Semantic Similarity

Metric Value
pearson_cosine 0.6568
spearman_cosine 0.6622
pearson_manhattan 0.6675
spearman_manhattan 0.6722
pearson_euclidean 0.6682
spearman_euclidean 0.6727
pearson_dot 0.5692
spearman_dot 0.5709
pearson_max 0.6682
spearman_max 0.6727

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pritamdeka/stsb-assamese-translated-dev_spearman_cosine pritamdeka/stsb-assamese-translated-test_spearman_cosine
0 0 - - 0.5489 -
0.0489 500 1.9387 1.7308 0.6808 -
0.0978 1000 1.0503 1.7373 0.6689 -
0.1467 1500 0.92 1.5838 0.6761 -
0.1956 2000 0.8754 1.4807 0.6518 -
0.2445 2500 0.7988 1.3797 0.6853 -
0.2933 3000 0.7606 1.3713 0.7108 -
0.3422 3500 0.7228 1.2510 0.6677 -
0.3911 4000 0.688 1.2374 0.6734 -
0.4400 4500 0.6992 1.2173 0.6891 -
0.4889 5000 0.6108 1.1638 0.7017 -
0.5378 5500 0.612 1.0815 0.7102 -
0.5867 6000 0.6259 1.0664 0.7202 -
0.6356 6500 0.5863 1.0464 0.7047 -
0.6845 7000 0.5941 1.0111 0.7101 -
0.7334 7500 0.5436 1.0023 0.7171 -
0.7822 8000 0.555 0.9633 0.7202 -
0.8311 8500 0.5466 0.9651 0.7279 -
0.8800 9000 0.5326 0.9611 0.7262 -
0.9289 9500 0.5055 0.9313 0.7276 -
0.9778 10000 0.4828 0.9172 0.7221 -
1.0 10227 - - - 0.6622
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
18
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pritamdeka/distilbert-base-multilingual-cased-indicxnli-random-negatives-v1

Finetuned
(235)
this model
Finetunes
1 model

Evaluation results

  • Pearson Cosine on pritamdeka/stsb assamese translated dev
    self-reported
    0.717
  • Spearman Cosine on pritamdeka/stsb assamese translated dev
    self-reported
    0.722
  • Pearson Manhattan on pritamdeka/stsb assamese translated dev
    self-reported
    0.738
  • Spearman Manhattan on pritamdeka/stsb assamese translated dev
    self-reported
    0.745
  • Pearson Euclidean on pritamdeka/stsb assamese translated dev
    self-reported
    0.739
  • Spearman Euclidean on pritamdeka/stsb assamese translated dev
    self-reported
    0.746
  • Pearson Dot on pritamdeka/stsb assamese translated dev
    self-reported
    0.648
  • Spearman Dot on pritamdeka/stsb assamese translated dev
    self-reported
    0.648
  • Pearson Max on pritamdeka/stsb assamese translated dev
    self-reported
    0.739
  • Spearman Max on pritamdeka/stsb assamese translated dev
    self-reported
    0.746