SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v_1_0_7_10")
# Run inference
sentences = [
    '科目:コンクリート。名称:浮き床コンクリート。',
    '科目:コンクリート。名称:オイルタンク基礎コンクリート。摘要:FC24 S18粗骨材20 高性能AE減水剤。備考:代価表    0108。',
    '科目:コンクリート。名称:普通コンクリート。摘要:FC=24 S15粗骨材基礎部。備考:代価表    0054。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 210,384 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 13.73 tokens
    • max: 19 tokens
    • min: 11 tokens
    • mean: 35.89 tokens
    • max: 72 tokens
    • 0: ~71.40%
    • 1: ~2.90%
    • 2: ~25.70%
  • Samples:
    sentence1 sentence2 label
    科目:コンクリート。名称:コンクリートポンプ圧送。 科目:コンクリート。名称:ポンプ圧送。 1
    科目:コンクリート。名称:コンクリートポンプ圧送。 科目:コンクリート。名称:コンクリートポンプ圧送。摘要:100m3/回以上基本料金別途加算。備考:B0-434226 No.1 市場免震層下部コン。 2
    科目:コンクリート。名称:コンクリートポンプ圧送。 科目:コンクリート。名称:コンクリートポンプ圧送。摘要:100m3/回以上基本料金別途加算。備考:B0-434226 No.1 市場湧水マット保護コン。 2
  • Loss: sentence_transformer_lib.categorical_constrastive_loss.CategoricalContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 10
  • warmup_ratio: 0.2
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0608 50 0.3009
0.1217 100 0.1359
0.1825 150 0.095
0.2433 200 0.0808
0.3041 250 0.0724
0.3650 300 0.0757
0.4258 350 0.0608
0.4866 400 0.0607
0.5474 450 0.0549
0.6083 500 0.051
0.6691 550 0.0517
0.7299 600 0.0432
0.7908 650 0.0436
0.8516 700 0.0418
0.9124 750 0.04
0.9732 800 0.0391
1.0341 850 0.038
1.0949 900 0.0352
1.1557 950 0.0329
1.2165 1000 0.029
1.2774 1050 0.0283
1.3382 1100 0.03
1.3990 1150 0.029
1.4599 1200 0.0274
1.5207 1250 0.0261
1.5815 1300 0.0248
1.6423 1350 0.0267
1.7032 1400 0.0234
1.7640 1450 0.0218
1.8248 1500 0.0217
1.8856 1550 0.0195
1.9465 1600 0.022
2.0073 1650 0.0195
2.0681 1700 0.0165
2.1290 1750 0.0155
2.1898 1800 0.0156
2.2506 1850 0.0148
2.3114 1900 0.0135
2.3723 1950 0.0122
2.4331 2000 0.0145
2.4939 2050 0.0138
2.5547 2100 0.0133
2.6156 2150 0.0137
2.6764 2200 0.0118
2.7372 2250 0.0132
2.7981 2300 0.0132
2.8589 2350 0.0129
2.9197 2400 0.0109
2.9805 2450 0.0115
3.0414 2500 0.0083
3.1022 2550 0.0082
3.1630 2600 0.0096
3.2238 2650 0.0081
3.2847 2700 0.0081
3.3455 2750 0.0083
3.4063 2800 0.01
3.4672 2850 0.0077
3.5280 2900 0.0081
3.5888 2950 0.0088
3.6496 3000 0.0088
3.7105 3050 0.0079
3.7713 3100 0.0075
3.8321 3150 0.0079
3.8929 3200 0.0066
3.9538 3250 0.0081
4.0146 3300 0.0062
4.0754 3350 0.0058
4.1363 3400 0.0055
4.1971 3450 0.0061
4.2579 3500 0.006
4.3187 3550 0.0057
4.3796 3600 0.0057
4.4404 3650 0.0061
4.5012 3700 0.0056
4.5620 3750 0.005
4.6229 3800 0.005
4.6837 3850 0.0054
4.7445 3900 0.0045
4.8054 3950 0.0062
4.8662 4000 0.0052

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support