SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v1_0_8_3")
# Run inference
sentences = [
    '科目:コンクリート。名称:EXP_J充填コンクリート。',
    '科目:コンクリート。名称:コンクリートポンプ圧送基本料金。',
    '科目:コンクリート。名称:EXP_J充填コンクリート。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 355,097 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 13.78 tokens
    • max: 19 tokens
    • min: 11 tokens
    • mean: 14.8 tokens
    • max: 23 tokens
    • 0: ~74.00%
    • 1: ~2.60%
    • 2: ~23.40%
  • Samples:
    sentence1 sentence2 label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震BPL下部充填コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部(外周基礎梁)コンクリート打設手間。 0
  • Loss: sentence_transformer_lib.categorical_constrastive_loss.CategoricalContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 4
  • warmup_ratio: 0.2
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0360 50 0.0445
0.0720 100 0.0441
0.1081 150 0.0409
0.1441 200 0.0425
0.1801 250 0.0374
0.2161 300 0.0356
0.2522 350 0.0345
0.2882 400 0.0338
0.3242 450 0.0312
0.3602 500 0.0274
0.3963 550 0.0281
0.4323 600 0.0298
0.4683 650 0.028
0.5043 700 0.0282
0.5403 750 0.0273
0.5764 800 0.0244
0.6124 850 0.0238
0.6484 900 0.021
0.6844 950 0.0206
0.7205 1000 0.0234
0.7565 1050 0.019
0.7925 1100 0.0181
0.8285 1150 0.0183
0.8646 1200 0.0187
0.9006 1250 0.0149
0.9366 1300 0.017
0.9726 1350 0.0158
1.0086 1400 0.0133
1.0447 1450 0.0124
1.0807 1500 0.0143
1.1167 1550 0.0131
1.1527 1600 0.0119
1.1888 1650 0.0112
1.2248 1700 0.0117
1.2608 1750 0.0107
1.2968 1800 0.0099
1.3329 1850 0.0112
1.3689 1900 0.01
1.4049 1950 0.0105
1.4409 2000 0.0092
1.4769 2050 0.0095
1.5130 2100 0.0104
1.5490 2150 0.0087
1.5850 2200 0.0092
1.6210 2250 0.0088
1.6571 2300 0.0088
1.6931 2350 0.0098
1.7291 2400 0.0086
1.7651 2450 0.0091
1.8012 2500 0.0072
1.8372 2550 0.0069
1.8732 2600 0.0076
1.9092 2650 0.0069
1.9452 2700 0.0077
1.9813 2750 0.0076
2.0173 2800 0.0065
2.0533 2850 0.0067
2.0893 2900 0.0059
2.1254 2950 0.0061
2.1614 3000 0.0055
2.1974 3050 0.0055
2.2334 3100 0.0057
2.2695 3150 0.0058
2.3055 3200 0.0069
2.3415 3250 0.0058
2.3775 3300 0.0054
2.4135 3350 0.0058
2.4496 3400 0.0047
2.4856 3450 0.0045
2.5216 3500 0.0054
2.5576 3550 0.0041
2.5937 3600 0.0048
2.6297 3650 0.0038
2.6657 3700 0.0048
2.7017 3750 0.0047
2.7378 3800 0.005
2.7738 3850 0.0046
2.8098 3900 0.0045
2.8458 3950 0.0042
2.8818 4000 0.0049
2.9179 4050 0.0043
2.9539 4100 0.0042
2.9899 4150 0.0039
3.0259 4200 0.004
3.0620 4250 0.0032
3.0980 4300 0.0038
3.1340 4350 0.0034
3.1700 4400 0.0033
3.2061 4450 0.0036
3.2421 4500 0.0029
3.2781 4550 0.0032
3.3141 4600 0.0036
3.3501 4650 0.0046
3.3862 4700 0.0037
3.4222 4750 0.0035
3.4582 4800 0.0034
3.4942 4850 0.0038
3.5303 4900 0.0034
3.5663 4950 0.0035
3.6023 5000 0.0037
3.6383 5050 0.0031
3.6744 5100 0.0042
3.7104 5150 0.0034
3.7464 5200 0.0035
3.7824 5250 0.0032
3.8184 5300 0.0032
3.8545 5350 0.0035
3.8905 5400 0.003
3.9265 5450 0.0033
3.9625 5500 0.0037
3.9986 5550 0.0028

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
124
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support