SentenceTransformer based on BAAI/bge-small-zh-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-zh-v1.5 on the train dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-zh-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 512 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • train

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'qa_217',
    '油壓箱table spin clamp油管壓接不良有漏油現象',
    '故障狀況 油壓箱table spin clamp油管壓接不良有漏油現象 處理狀況 備油管為客戶更換',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 164 training samples
  • Columns: question, chunk, and label
  • Approximate statistics based on the first 164 samples:
    question chunk label
    type string string float
    details
    • min: 6 tokens
    • mean: 23.19 tokens
    • max: 86 tokens
    • min: 21 tokens
    • mean: 79.21 tokens
    • max: 176 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    question chunk label
    1中噴箱體壓力表異常 故障狀況 1中噴箱體壓力表異常 處理狀況 1依照廠商檢查方案過濾灌乾淨未阻塞濾心乾淨壓力表洩氣未改善 2更換壓力表安裝測試中噴壓力已改善客戶確認OK 1.0
    1用戶反應機台有漏水現象 故障狀況 1用戶反應機台有漏水現象 處理狀況 1查修後危機台左後立柱位置漏出拆開Y後伸縮護罩鈑金重新填上矽利康測試確認已無漏水 1.0
    風槍的管路破裂會漏風 故障狀況 風槍的管路破裂會漏風 處理狀況 備風槍管為客戶更換 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

train

  • Dataset: train
  • Size: 40 evaluation samples
  • Columns: question, chunk, and label
  • Approximate statistics based on the first 40 samples:
    question chunk label
    type string string float
    details
    • min: 7 tokens
    • mean: 22.3 tokens
    • max: 90 tokens
    • min: 23 tokens
    • mean: 69.75 tokens
    • max: 144 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    question chunk label
    冷氣機結冰 故障狀況 冷氣機結冰 處理狀況 經威士頓評估後 同意保固提供一片冷氣控制板給客戶更換 1.0
    1客戶要求刀臂sensor異常時需動作停止避免刀臂一直揮造成人員受傷 故障狀況 1客戶要求刀臂sensor異常時需動作停止避免刀臂一直揮造成人員受傷 處理狀況 1修改PLC並測試所有sensor異常時需刀臂停止測試給用戶確認ok 1.0
    更換鏈條以及鏈條軸承 故障狀況 更換鏈條以及鏈條軸承 處理狀況 備料為客戶更換 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • max_steps: 500
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: 500
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss train loss
9.0909 100 2.3557 2.8228
18.1818 200 0.3241 2.9318
27.2727 300 0.0786 3.0996
36.3636 400 0.0408 3.1550
45.4545 500 0.0328 3.1758
9.0909 100 0.2424 0.0369
18.1818 200 0.0199 0.0374
27.2727 300 0.0231 0.0395
36.3636 400 0.0178 0.0387
45.4545 500 0.0157 0.0385
9.0909 100 0.0172 0.0000
18.1818 200 0.002 0.0000
27.2727 300 0.0016 0.0000
36.3636 400 0.0014 0.0000
45.4545 500 0.0013 0.0000

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.5.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
8
Safetensors
Model size
24M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ntucool/mlogging

Finetuned
(5)
this model