SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v_1_0_7_5")
# Run inference
sentences = [
    '科目:コンクリート。名称:基礎部マスコンクリート。',
    '科目:コンクリート。名称:オイルタンク基礎コンクリート。摘要:FC24 S18粗骨材20 高性能AE減水剤。備考:代価表    0108。',
    '科目:コンクリート。名称:普通コンクリート。摘要:FC=24 S15粗骨材基礎部。備考:代価表    0054。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 197,418 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 13.71 tokens
    • max: 19 tokens
    • min: 11 tokens
    • mean: 31.5 tokens
    • max: 72 tokens
    • 0: ~61.50%
    • 1: ~5.60%
    • 2: ~32.90%
  • Samples:
    sentence1 sentence2 label
    科目:コンクリート。名称:コンクリートポンプ圧送。 科目:コンクリート。名称:ポンプ圧送。 1
    科目:コンクリート。名称:コンクリートポンプ圧送。 科目:コンクリート。名称:コンクリートポンプ圧送。摘要:100m3/回以上基本料金別途加算。備考:B0-434226 No.1 市場捨てコン。 0
    科目:コンクリート。名称:コンクリートポンプ圧送。 科目:コンクリート。名称:コンクリート打設手間。摘要:躯体 ポンプ打設100m3/回以上 S15~S18標準階高 圧送費、基本料別途。備考:B0-434215 No.1 市場地上部コン(1F)。 0
  • Loss: sentence_transformer_lib.categorical_constrastive_loss.CategoricalContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 20
  • warmup_ratio: 0.2
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0648 50 0.2993
0.1295 100 0.1925
0.1943 150 0.1197
0.2591 200 0.1054
0.3238 250 0.0849
0.3886 300 0.0854
0.4534 350 0.0716
0.5181 400 0.0659
0.5829 450 0.0641
0.6477 500 0.0641
0.7124 550 0.0619
0.7772 600 0.0589
0.8420 650 0.0564
0.9067 700 0.0506
0.9715 750 0.0513
1.0363 800 0.0473
1.1010 850 0.0451
1.1658 900 0.044
1.2306 950 0.0418
1.2953 1000 0.042
1.3601 1050 0.0337
1.4249 1100 0.0337
1.4896 1150 0.0354
1.5544 1200 0.0353
1.6192 1250 0.0353
1.6839 1300 0.0323
1.7487 1350 0.0297
1.8135 1400 0.0331
1.8782 1450 0.0303
1.9430 1500 0.0286
2.0078 1550 0.0265
2.0725 1600 0.0257
2.1373 1650 0.0195
2.2021 1700 0.0225
2.2668 1750 0.0206
2.3316 1800 0.0231
2.3964 1850 0.0225
2.4611 1900 0.0203
2.5259 1950 0.0207
2.5907 2000 0.02
2.6554 2050 0.0181
2.7202 2100 0.0202
2.7850 2150 0.0187
2.8497 2200 0.0192
2.9145 2250 0.0168
2.9793 2300 0.0162
3.0440 2350 0.0159
3.1088 2400 0.0145
3.1736 2450 0.0134
3.2383 2500 0.0138
3.3031 2550 0.0125
3.3679 2600 0.0132
3.4326 2650 0.0122
3.4974 2700 0.0133
3.5622 2750 0.0127
3.6269 2800 0.0125
3.6917 2850 0.0107
3.7565 2900 0.0114
3.8212 2950 0.0104
3.8860 3000 0.0107
3.9508 3050 0.0112
4.0155 3100 0.0084
4.0803 3150 0.0086
4.1451 3200 0.0077
4.2098 3250 0.0098
4.2746 3300 0.0068
4.3394 3350 0.0082
4.4041 3400 0.0064
4.4689 3450 0.0083
4.5337 3500 0.0065
4.5984 3550 0.0067
4.6632 3600 0.0074
4.7280 3650 0.0078
4.7927 3700 0.0072
4.8575 3750 0.0077
4.9223 3800 0.007
4.9870 3850 0.0067
5.0518 3900 0.0057
5.1166 3950 0.0054
5.1813 4000 0.0046

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support