nabil-tazi's picture
Upload folder using huggingface_hub
0b18469 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:456
  - loss:SoftmaxLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
widget:
  - source_sentence: not especially natural
    sentences:
      - bright
      - bright
      - bright
  - source_sentence: γγ€γ‚γ„γ ζ„Ÿγ˜γ˜γ‚ƒγͺい
    sentences:
      - bright
      - bright
      - cozy
  - source_sentence: not especially bright
    sentences:
      - bright
      - cozy
      - natural
  - source_sentence: ζ˜Žγ‚‹γγ—γͺいで
    sentences:
      - cozy
      - cozy
      - bright
  - source_sentence: This room feels too cozy I need something more energetic
    sentences:
      - cozy
      - bright
      - bright
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the πŸ€— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'This room feels too cozy I need something more energetic',
    'bright',
    'cozy',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 456 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 456 samples:
    premise hypothesis label
    type string string int
    details
    • min: 4 tokens
    • mean: 7.82 tokens
    • max: 20 tokens
    • min: 3 tokens
    • mean: 3.31 tokens
    • max: 4 tokens
    • 0: ~17.98%
    • 1: ~82.02%
  • Samples:
    premise hypothesis label
    not romantic lighting bright 1
    These lights are way too bright please turn them down cozy 1
    not quite cozy bright 1
  • Loss: SoftmaxLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 115 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 115 samples:
    premise hypothesis label
    type string string int
    details
    • min: 3 tokens
    • mean: 7.24 tokens
    • max: 17 tokens
    • min: 3 tokens
    • mean: 3.38 tokens
    • max: 4 tokens
    • 0: ~20.87%
    • 1: ~79.13%
  • Samples:
    premise hypothesis label
    not warm cozy 0
    In the evening I want lighting that's not bright but cozy cozy 1
    ζ˜Žγ‚‹γ„ε…‰γ―θ‹¦ζ‰‹γ§γ™ cozy 1
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 7
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • ddp_find_unused_parameters: False

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 7
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.1724 5 0.7808 -
0.3448 10 0.7224 -
0.5172 15 0.5833 -
0.6897 20 0.4336 -
0.8621 25 0.426 -
1.0 29 - 0.4209
1.0345 30 0.407 -
1.2069 35 0.4633 -
1.3793 40 0.2629 -
1.5517 45 0.4468 -
1.7241 50 0.3665 -
1.8966 55 0.2735 -
2.0 58 - 0.3269
2.0690 60 0.2472 -
2.2414 65 0.2586 -
2.4138 70 0.2281 -
2.5862 75 0.3056 -
2.7586 80 0.2166 -
2.9310 85 0.2243 -
3.0 87 - 0.2471
3.1034 90 0.2233 -
3.2759 95 0.1625 -
3.4483 100 0.1718 -
3.6207 105 0.1728 -
3.7931 110 0.1949 -
3.9655 115 0.0891 -
4.0 116 - 0.1997
4.1379 120 0.1895 -
4.3103 125 0.1021 -
4.4828 130 0.1232 -
4.6552 135 0.0891 -
4.8276 140 0.109 -
5.0 145 0.0879 0.1679
5.1724 150 0.0814 -
5.3448 155 0.1015 -
5.5172 160 0.0822 -
5.6897 165 0.1054 -
5.8621 170 0.1093 -
6.0 174 - 0.1479
6.0345 175 0.0911 -
6.2069 180 0.0804 -
6.3793 185 0.1063 -
6.5517 190 0.0821 -
6.7241 195 0.0988 -
6.8966 200 0.0691 -
7.0 203 - 0.1430

Framework Versions

  • Python: 3.10.16
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0
  • PyTorch: 2.4.0
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}