SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v1_0_9_1")
# Run inference
sentences = [
    '科目:タイル。名称:床タイル。',
    '科目:タイル。名称:屋外階段踊場タイル張り。',
    '科目:タイル。名称:段鼻タイル。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 366,717 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 13.8 tokens
    • max: 19 tokens
    • min: 11 tokens
    • mean: 14.78 tokens
    • max: 23 tokens
    • 0: ~66.70%
    • 1: ~3.50%
    • 2: ~29.80%
  • Samples:
    sentence1 sentence2 label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部(外周基礎梁)コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震下部コンクリート打設手間。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 科目:コンクリート。名称:免震BPL下部充填コンクリート打設手間。 0
  • Loss: sentence_transformer_lib.categorical_constrastive_loss.CategoricalContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • warmup_ratio: 0.2
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0349 50 0.0328
0.0698 100 0.036
0.1047 150 0.0357
0.1396 200 0.0324
0.1745 250 0.0335
0.2094 300 0.0354
0.2442 350 0.0322
0.2791 400 0.0321
0.3140 450 0.0273
0.3489 500 0.025
0.3838 550 0.0245
0.4187 600 0.0242
0.4536 650 0.0224
0.4885 700 0.0239
0.5234 750 0.0228
0.5583 800 0.0243
0.5932 850 0.0208
0.6281 900 0.022
0.6629 950 0.0196
0.6978 1000 0.0224
0.7327 1050 0.0177
0.7676 1100 0.0189
0.8025 1150 0.0158
0.8374 1200 0.017
0.8723 1250 0.0146
0.9072 1300 0.0144
0.9421 1350 0.0158
0.9770 1400 0.0144
1.0119 1450 0.0146
1.0468 1500 0.0115
1.0816 1550 0.0105
1.1165 1600 0.0108
1.1514 1650 0.0113
1.1863 1700 0.0109
1.2212 1750 0.0084
1.2561 1800 0.0099
1.2910 1850 0.0104
1.3259 1900 0.0112
1.3608 1950 0.0084
1.3957 2000 0.0083
1.4306 2050 0.0094
1.4655 2100 0.0093
1.5003 2150 0.007
1.5352 2200 0.0082
1.5701 2250 0.0098
1.6050 2300 0.0082
1.6399 2350 0.0074
1.6748 2400 0.0081
1.7097 2450 0.0076
1.7446 2500 0.0076
1.7795 2550 0.0093
1.8144 2600 0.0079
1.8493 2650 0.0075
1.8842 2700 0.0075
1.9191 2750 0.0068
1.9539 2800 0.0065
1.9888 2850 0.0071
2.0237 2900 0.006
2.0586 2950 0.0053
2.0935 3000 0.0048
2.1284 3050 0.0056
2.1633 3100 0.0063
2.1982 3150 0.005
2.2331 3200 0.0052
2.2680 3250 0.0047
2.3029 3300 0.0052
2.3378 3350 0.0063
2.3726 3400 0.0052
2.4075 3450 0.0048
2.4424 3500 0.0052
2.4773 3550 0.0057
2.5122 3600 0.0047
2.5471 3650 0.0048
2.5820 3700 0.0058
2.6169 3750 0.0055
2.6518 3800 0.005
2.6867 3850 0.0057
2.7216 3900 0.0044
2.7565 3950 0.0052
2.7913 4000 0.0049
2.8262 4050 0.0046
2.8611 4100 0.0053
2.8960 4150 0.0051
2.9309 4200 0.0048
2.9658 4250 0.0043

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 2.14.4
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
202
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Detomo/cl-nagoya-sup-simcse-ja-nss-v1_0_9_1 1