SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v1_0_8_4")
# Run inference
sentences = [
    '科目:タイル。名称:タイル出隅コーナー。',
    '科目:タイル。名称:段床タイル。',
    '科目:タイル。名称:地流し床タイル。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 354,235 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 13.78 tokens
    • max: 19 tokens
    • min: 11 tokens
    • mean: 14.8 tokens
    • max: 23 tokens
    • 0: ~74.10%
    • 1: ~2.60%
    • 2: ~23.30%
  • Samples:
    sentence1 sentence2 label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震BPL下部充填コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部(外周基礎梁)コンクリート打設手間。 0
  • Loss: sentence_transformer_lib.categorical_constrastive_loss.CategoricalContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 4
  • warmup_ratio: 0.2
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0361 50 0.0492
0.0723 100 0.0438
0.1084 150 0.0385
0.1445 200 0.0376
0.1806 250 0.0388
0.2168 300 0.0391
0.2529 350 0.0337
0.2890 400 0.0354
0.3251 450 0.0322
0.3613 500 0.0345
0.3974 550 0.0278
0.4335 600 0.0282
0.4697 650 0.0261
0.5058 700 0.0284
0.5419 750 0.0264
0.5780 800 0.0251
0.6142 850 0.02
0.6503 900 0.0253
0.6864 950 0.0197
0.7225 1000 0.0221
0.7587 1050 0.0204
0.7948 1100 0.0188
0.8309 1150 0.0198
0.8671 1200 0.0203
0.9032 1250 0.0177
0.9393 1300 0.0162
0.9754 1350 0.0148
1.0116 1400 0.014
1.0477 1450 0.011
1.0838 1500 0.0123
1.1199 1550 0.0126
1.1561 1600 0.01
1.1922 1650 0.0124
1.2283 1700 0.0107
1.2645 1750 0.0107
1.3006 1800 0.0118
1.3367 1850 0.0103
1.3728 1900 0.0102
1.4090 1950 0.0104
1.4451 2000 0.01
1.4812 2050 0.0101
1.5173 2100 0.0098
1.5535 2150 0.0097
1.5896 2200 0.0093
1.6257 2250 0.0088
1.6618 2300 0.0095
1.6980 2350 0.0103
1.7341 2400 0.0077
1.7702 2450 0.0085
1.8064 2500 0.0082
1.8425 2550 0.0074
1.8786 2600 0.0081
1.9147 2650 0.0067
1.9509 2700 0.0082
1.9870 2750 0.0076
2.0231 2800 0.0067
2.0592 2850 0.0056
2.0954 2900 0.0065
2.1315 2950 0.0057
2.1676 3000 0.0059
2.2038 3050 0.0047
2.2399 3100 0.0051
2.2760 3150 0.0049
2.3121 3200 0.0051
2.3483 3250 0.0049
2.3844 3300 0.0045
2.4205 3350 0.0047
2.4566 3400 0.0052
2.4928 3450 0.004
2.5289 3500 0.0057
2.5650 3550 0.0046
2.6012 3600 0.0052
2.6373 3650 0.0049
2.6734 3700 0.0046
2.7095 3750 0.0056
2.7457 3800 0.0054
2.7818 3850 0.0037
2.8179 3900 0.0044
2.8540 3950 0.0037
2.8902 4000 0.0049
2.9263 4050 0.0044
2.9624 4100 0.0046
2.9986 4150 0.0041
3.0347 4200 0.0044
3.0708 4250 0.0035
3.1069 4300 0.0029
3.1431 4350 0.0035
3.1792 4400 0.0031
3.2153 4450 0.0038
3.2514 4500 0.0039
3.2876 4550 0.0034
3.3237 4600 0.0043
3.3598 4650 0.0042
3.3960 4700 0.004
3.4321 4750 0.0028
3.4682 4800 0.0035
3.5043 4850 0.0033
3.5405 4900 0.0039
3.5766 4950 0.0045
3.6127 5000 0.0032
3.6488 5050 0.0036
3.6850 5100 0.0032
3.7211 5150 0.0031
3.7572 5200 0.0043
3.7934 5250 0.0032
3.8295 5300 0.0034
3.8656 5350 0.0029
3.9017 5400 0.0037
3.9379 5450 0.0028
3.9740 5500 0.0028

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
21
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support