SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v0_9_17")
# Run inference
sentences = [
    '科目:タイル。名称:階段蹴上タイル。',
    '科目:ユニット及びその他。名称:配膳室配膳棚(#段)。',
    '科目:ユニット及びその他。名称:#F患者図書室雑誌棚。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 14,153 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string int
    details
    • min: 11 tokens
    • mean: 17.23 tokens
    • max: 29 tokens
    • 0: ~0.30%
    • 1: ~0.30%
    • 2: ~0.30%
    • 3: ~0.30%
    • 4: ~0.30%
    • 5: ~0.30%
    • 6: ~0.30%
    • 7: ~0.30%
    • 8: ~0.30%
    • 9: ~0.30%
    • 10: ~0.30%
    • 11: ~0.40%
    • 12: ~0.30%
    • 13: ~0.30%
    • 14: ~0.30%
    • 15: ~0.30%
    • 16: ~0.30%
    • 17: ~0.30%
    • 18: ~0.50%
    • 19: ~0.30%
    • 20: ~0.30%
    • 21: ~0.30%
    • 22: ~0.30%
    • 23: ~0.30%
    • 24: ~0.30%
    • 25: ~0.30%
    • 26: ~0.30%
    • 27: ~0.30%
    • 28: ~0.30%
    • 29: ~0.30%
    • 30: ~0.30%
    • 31: ~0.30%
    • 32: ~0.30%
    • 33: ~0.30%
    • 34: ~0.30%
    • 35: ~0.30%
    • 36: ~0.30%
    • 37: ~0.30%
    • 38: ~0.30%
    • 39: ~0.30%
    • 40: ~0.40%
    • 41: ~0.30%
    • 42: ~0.30%
    • 43: ~0.30%
    • 44: ~0.60%
    • 45: ~0.70%
    • 46: ~0.30%
    • 47: ~0.30%
    • 48: ~0.30%
    • 49: ~0.30%
    • 50: ~0.30%
    • 51: ~0.30%
    • 52: ~0.30%
    • 53: ~0.30%
    • 54: ~0.30%
    • 55: ~0.30%
    • 56: ~0.30%
    • 57: ~0.80%
    • 58: ~0.30%
    • 59: ~0.30%
    • 60: ~0.60%
    • 61: ~0.30%
    • 62: ~0.30%
    • 63: ~0.30%
    • 64: ~0.50%
    • 65: ~0.30%
    • 66: ~0.30%
    • 67: ~0.30%
    • 68: ~0.30%
    • 69: ~0.50%
    • 70: ~0.60%
    • 71: ~0.30%
    • 72: ~0.30%
    • 73: ~0.30%
    • 74: ~0.30%
    • 75: ~0.30%
    • 76: ~0.30%
    • 77: ~0.30%
    • 78: ~0.30%
    • 79: ~0.30%
    • 80: ~0.30%
    • 81: ~0.30%
    • 82: ~0.30%
    • 83: ~0.30%
    • 84: ~0.80%
    • 85: ~0.60%
    • 86: ~0.50%
    • 87: ~0.30%
    • 88: ~0.30%
    • 89: ~16.30%
    • 90: ~0.30%
    • 91: ~0.30%
    • 92: ~0.30%
    • 93: ~0.30%
    • 94: ~0.30%
    • 95: ~0.30%
    • 96: ~0.30%
    • 97: ~0.30%
    • 98: ~0.50%
    • 99: ~0.30%
    • 100: ~0.30%
    • 101: ~0.30%
    • 102: ~0.30%
    • 103: ~0.30%
    • 104: ~0.30%
    • 105: ~0.30%
    • 106: ~0.30%
    • 107: ~0.70%
    • 108: ~0.30%
    • 109: ~3.20%
    • 110: ~0.30%
    • 111: ~0.40%
    • 112: ~2.30%
    • 113: ~0.30%
    • 114: ~0.30%
    • 115: ~0.50%
    • 116: ~0.50%
    • 117: ~0.50%
    • 118: ~0.40%
    • 119: ~0.30%
    • 120: ~0.30%
    • 121: ~0.30%
    • 122: ~0.80%
    • 123: ~0.30%
    • 124: ~0.30%
    • 125: ~0.30%
    • 126: ~0.30%
    • 127: ~0.30%
    • 128: ~0.30%
    • 129: ~0.30%
    • 130: ~0.30%
    • 131: ~0.50%
    • 132: ~0.30%
    • 133: ~0.40%
    • 134: ~0.30%
    • 135: ~0.30%
    • 136: ~0.30%
    • 137: ~0.30%
    • 138: ~0.30%
    • 139: ~0.30%
    • 140: ~0.30%
    • 141: ~0.30%
    • 142: ~0.30%
    • 143: ~0.30%
    • 144: ~0.40%
    • 145: ~0.30%
    • 146: ~0.30%
    • 147: ~0.30%
    • 148: ~0.30%
    • 149: ~0.30%
    • 150: ~0.30%
    • 151: ~0.70%
    • 152: ~0.30%
    • 153: ~0.30%
    • 154: ~0.30%
    • 155: ~1.30%
    • 156: ~0.30%
    • 157: ~0.30%
    • 158: ~0.30%
    • 159: ~0.30%
    • 160: ~0.30%
    • 161: ~1.50%
    • 162: ~0.30%
    • 163: ~0.30%
    • 164: ~0.30%
    • 165: ~0.30%
    • 166: ~0.30%
    • 167: ~0.30%
    • 168: ~0.30%
    • 169: ~1.50%
    • 170: ~0.30%
    • 171: ~0.30%
    • 172: ~7.20%
    • 173: ~0.30%
    • 174: ~1.00%
    • 175: ~0.30%
    • 176: ~0.30%
    • 177: ~0.30%
    • 178: ~1.80%
    • 179: ~0.30%
    • 180: ~0.50%
    • 181: ~0.70%
    • 182: ~0.30%
    • 183: ~0.30%
    • 184: ~0.30%
    • 185: ~0.30%
    • 186: ~0.30%
    • 187: ~0.30%
    • 188: ~0.30%
    • 189: ~0.50%
    • 190: ~2.50%
  • Samples:
    sentence label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
  • Loss: sentence_transformer_lib.custom_batch_all_trip_loss.CustomBatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 250
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: group_by_label

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 250
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: group_by_label
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
4.5714 100 0.0629
9.3929 200 0.0706
14.2143 300 0.0728
19.0357 400 0.0716
23.6071 500 0.0683
28.4286 600 0.0669
33.25 700 0.0686
38.0714 800 0.0656
42.6429 900 0.0592
47.4643 1000 0.0659
52.2857 1100 0.0621
57.1071 1200 0.064
61.6786 1300 0.0604
66.5 1400 0.0603
71.3214 1500 0.0608
76.1429 1600 0.0581
80.7143 1700 0.0522
85.5357 1800 0.055
90.3571 1900 0.0544
95.1786 2000 0.0602
99.75 2100 0.056
104.5714 2200 0.0519
109.3929 2300 0.0521
114.2143 2400 0.0506
119.0357 2500 0.0538
123.6071 2600 0.0527
128.4286 2700 0.0514
133.25 2800 0.0513
138.0714 2900 0.0447
142.6429 3000 0.0528
147.4643 3100 0.0486
152.2857 3200 0.0446
157.1071 3300 0.0451
161.6786 3400 0.0451
166.5 3500 0.0459
171.3214 3600 0.0485
176.1429 3700 0.0469
180.7143 3800 0.0446
185.5357 3900 0.0443
190.3571 4000 0.0439
195.1786 4100 0.0382
199.75 4200 0.0401
204.5714 4300 0.0441
209.3929 4400 0.0397
214.2143 4500 0.037
219.0357 4600 0.04
223.6071 4700 0.0386
228.4286 4800 0.0396
233.25 4900 0.0387
238.0714 5000 0.0408
242.6429 5100 0.0396
247.4643 5200 0.0363

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CustomBatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
140
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Detomo/cl-nagoya-sup-simcse-ja-nss-v0_9_17 1