SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v1_0_8")
# Run inference
sentences = [
    '科目:コンクリート。名称:基礎部コンクリート打設手間。',
    '科目:コンクリート。名称:底盤コンクリート打設手間。',
    '科目:タイル。名称:外壁ガラスモザイクタイル張り。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 327,543 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 13.78 tokens
    • max: 19 tokens
    • min: 11 tokens
    • mean: 14.8 tokens
    • max: 23 tokens
    • 0: ~74.10%
    • 1: ~2.60%
    • 2: ~23.30%
  • Samples:
    sentence1 sentence2 label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震BPL下部充填コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部(外周基礎梁)コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部コンクリート打設手間。 0
  • Loss: sentence_transformer_lib.categorical_constrastive_loss.CategoricalContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 4
  • warmup_ratio: 0.2
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0391 50 0.0432
0.0781 100 0.0449
0.1172 150 0.0429
0.1562 200 0.0397
0.1953 250 0.0395
0.2344 300 0.0312
0.2734 350 0.0347
0.3125 400 0.0303
0.3516 450 0.0298
0.3906 500 0.0321
0.4297 550 0.0266
0.4688 600 0.0254
0.5078 650 0.0267
0.5469 700 0.0244
0.5859 750 0.0238
0.625 800 0.0229
0.6641 850 0.023
0.7031 900 0.0189
0.7422 950 0.0207
0.7812 1000 0.0201
0.8203 1050 0.0188
0.8594 1100 0.0153
0.8984 1150 0.0168
0.9375 1200 0.014
0.9766 1250 0.0155
1.0156 1300 0.0141
1.0547 1350 0.0139
1.0938 1400 0.0121
1.1328 1450 0.0121
1.1719 1500 0.0109
1.2109 1550 0.0116
1.25 1600 0.0119
1.2891 1650 0.0102
1.3281 1700 0.0095
1.3672 1750 0.0089
1.4062 1800 0.0109
1.4453 1850 0.0094
1.4844 1900 0.0094
1.5234 1950 0.0089
1.5625 2000 0.0088
1.6016 2050 0.0081
1.6406 2100 0.0082
1.6797 2150 0.0072
1.7188 2200 0.0075
1.7578 2250 0.0078
1.7969 2300 0.0081
1.8359 2350 0.0079
1.875 2400 0.008
1.9141 2450 0.0079
1.9531 2500 0.0071
1.9922 2550 0.0089
2.0312 2600 0.0063
2.0703 2650 0.0055
2.1094 2700 0.0053
2.1484 2750 0.0053
2.1875 2800 0.0054
2.2266 2850 0.0046
2.2656 2900 0.005
2.3047 2950 0.0053
2.3438 3000 0.0047
2.3828 3050 0.0052
2.4219 3100 0.0049
2.4609 3150 0.0055
2.5 3200 0.0047
2.5391 3250 0.0048
2.5781 3300 0.0046
2.6172 3350 0.0049
2.6562 3400 0.0049
2.6953 3450 0.0051
2.7344 3500 0.0045
2.7734 3550 0.0044
2.8125 3600 0.0049
2.8516 3650 0.0048
2.8906 3700 0.0047
2.9297 3750 0.0044
2.9688 3800 0.0041
3.0078 3850 0.0039
3.0469 3900 0.0038
3.0859 3950 0.0033
3.125 4000 0.0037
3.1641 4050 0.0036
3.2031 4100 0.004
3.2422 4150 0.0036
3.2812 4200 0.0038
3.3203 4250 0.004
3.3594 4300 0.004
3.3984 4350 0.0039
3.4375 4400 0.0031
3.4766 4450 0.0031
3.5156 4500 0.0038
3.5547 4550 0.0031
3.5938 4600 0.0029
3.6328 4650 0.0031
3.6719 4700 0.003
3.7109 4750 0.0036
3.75 4800 0.0035
3.7891 4850 0.0029
3.8281 4900 0.0033
3.8672 4950 0.0031
3.9062 5000 0.0036
3.9453 5050 0.0037
3.9844 5100 0.0031

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
94
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support