zihoo's picture
Add new SentenceTransformer model.
9d303d5 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:8000
  - loss:AMSoftmaxLossNLI
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: I avoid any financial dealings with them
    sentences:
      - I observe my reaction to shifting deadlines.
      - I avoid volunteering for projects they're involved in
      - I concentrate solely on task execution.
  - source_sentence: I stay aware of my feelings towards team diversity.
    sentences:
      - I expect them to criticize my performance unfairly
      - I return focus when distracted by phone notifications.
      - I remain aware of my body's signals during long work hours.
  - source_sentence: I avoid volunteering for projects they're involved in
    sentences:
      - I place primary attention on resolving immediate conflicts.
      - I avoid collaborating with them on projects due to past experiences
      - I feel irritation whenever they share their opinions
  - source_sentence: I notice my mood changes during stressful work situations.
    sentences:
      - I am aware of my energy fluctuations throughout the workday.
      - I notice how I feel after completing significant tasks.
      - I accept varied perspectives from my team graciously.
  - source_sentence: I re-align my focus after a mental lapse.
    sentences:
      - I acknowledge my internal responses to tight deadlines.
      - I accept and appreciate differences in colleagues' working styles.
      - I accept and learn from performance reviews.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zihoo/all-MiniLM-L6-v2-WMNLI-margin")
# Run inference
sentences = [
    'I re-align my focus after a mental lapse.',
    'I acknowledge my internal responses to tight deadlines.',
    'I accept and learn from performance reviews.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 8,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 8 tokens
    • mean: 11.65 tokens
    • max: 17 tokens
    • min: 8 tokens
    • mean: 11.77 tokens
    • max: 17 tokens
    • 0: ~25.80%
    • 1: ~36.80%
    • 2: ~37.40%
  • Samples:
    sentence1 sentence2 label
    I focus on one work task at a time. I keep my attention on the task despite office chatter. 0
    I worry they might spread false rumors about me I return focus to my work when my mind drifts. 2
    I stay aware of my posture when working at a desk. I pay attention to non-verbal cues from others. 0
  • Loss: main.AMSoftmaxLossNLI

Evaluation Dataset

Unnamed Dataset

  • Size: 2,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 8 tokens
    • mean: 11.68 tokens
    • max: 17 tokens
    • min: 8 tokens
    • mean: 11.79 tokens
    • max: 17 tokens
    • 0: ~24.40%
    • 1: ~36.30%
    • 2: ~39.30%
  • Samples:
    sentence1 sentence2 label
    I stay conscious of my emotional responses to work challenges. I pay close attention to verbal instructions. 1
    I accept varied perspectives from my team graciously. I accept team dynamics as they naturally evolve. 0
    I accept technology upgrades with an open heart. I am mindful of my facial expressions during discussions. 1
  • Loss: main.AMSoftmaxLossNLI

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 3e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.01

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.01
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.4 100 1.5746 0.9785
0.8 200 0.9714 0.9328
1.2 300 0.9572 0.9329
1.6 400 0.9315 0.9430
2.0 500 0.9393 0.9355
2.4 600 0.9332 0.9217
2.8 700 0.9291 0.9185
3.2 800 0.9394 0.9456
3.6 900 0.9256 0.9254
4.0 1000 0.9301 0.9147
4.4 1100 0.9055 0.9189
4.8 1200 0.9382 0.9231
5.2 1300 0.9198 0.9188
5.6 1400 0.915 0.9280
6.0 1500 0.9239 0.9323
6.4 1600 0.9078 0.9276
6.8 1700 0.918 0.9223
7.2 1800 0.9133 0.9225
7.6 1900 0.9097 0.9268
8.0 2000 0.9202 0.9271
8.4 2100 0.9204 0.9315
8.8 2200 0.9019 0.9252
9.2 2300 0.9074 0.9236
9.6 2400 0.9094 0.9234
10.0 2500 0.9026 0.9235

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}