SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("hanwenzhu/all-distilroberta-v1-lr2e-4-bs256-nneg3-ml-ne5-mar17")
# Run inference
sentences = [
    'Mathlib.Analysis.SpecialFunctions.Complex.LogDeriv#35',
    'HasDerivAt.clog',
    'Nat.cast_zero',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,817,740 training samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 11 tokens
    • mean: 16.44 tokens
    • max: 24 tokens
    • min: 3 tokens
    • mean: 10.9 tokens
    • max: 50 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Field.IsField#12 Classical.choose_spec
    Mathlib.Algebra.Field.IsField#12 IsField.mul_comm
    Mathlib.Algebra.Field.IsField#12 eq_of_heq
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,959 evaluation samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 10 tokens
    • mean: 17.08 tokens
    • max: 24 tokens
    • min: 5 tokens
    • mean: 11.05 tokens
    • max: 31 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Algebra.Hom#80 AlgHom.commutes
    Mathlib.Algebra.Algebra.NonUnitalSubalgebra#237 NonUnitalAlgHom.instNonUnitalAlgSemiHomClass
    Mathlib.Algebra.Algebra.NonUnitalSubalgebra#237 NonUnitalAlgebra.mem_top
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0002
  • num_train_epochs: 5.0
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • bf16: True
  • dataloader_num_workers: 4
  • resume_from_checkpoint: /data/user_data/thomaszh/models/all-distilroberta-v1-lr2e-4-bs256-nneg3-ml-ne5/checkpoint-104604

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5.0
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: /data/user_data/thomaszh/models/all-distilroberta-v1-lr2e-4-bs256-nneg3-ml-ne5/checkpoint-104604
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
4.6031 104610 0.4939 -
4.6035 104620 0.4904 -
4.6040 104630 0.481 -
4.6044 104640 0.486 -
4.6049 104650 0.4596 -
4.6053 104660 0.4864 -
4.6057 104670 0.4577 -
4.6062 104680 0.4646 -
4.6066 104690 0.4478 -
4.6071 104700 0.4844 -
4.6075 104710 0.4836 -
4.6079 104720 0.4445 -
4.6084 104730 0.4883 -
4.6088 104740 0.5054 -
4.6093 104750 0.4992 -
4.6097 104760 0.4759 -
4.6101 104770 0.483 -
4.6106 104780 0.4668 -
4.6110 104790 0.4839 -
4.6115 104800 0.4426 -
4.6119 104810 0.4851 -
4.6123 104820 0.4837 -
4.6128 104830 0.4728 -
4.6132 104840 0.4796 -
4.6137 104850 0.4824 -
4.6141 104860 0.4948 -
4.6145 104870 0.4902 -
4.6150 104880 0.4565 -
4.6154 104890 0.5068 -
4.6159 104900 0.4881 -
4.6163 104910 0.5064 -
4.6167 104920 0.4877 -
4.6172 104930 0.498 -
4.6176 104940 0.478 -
4.6181 104950 0.4972 -
4.6185 104960 0.4654 -
4.6189 104970 0.4544 -
4.6194 104980 0.477 -
4.6198 104990 0.4957 -
4.6203 105000 0.4695 -
4.6207 105010 0.4927 -
4.6211 105020 0.4805 -
4.6216 105030 0.4929 -
4.6220 105040 0.4711 -
4.6225 105050 0.4814 -
4.6229 105060 0.464 -
4.6233 105070 0.4752 -
4.6238 105080 0.4609 -
4.6242 105090 0.4754 -
4.6247 105100 0.48 -
4.6251 105110 0.4587 -
4.6255 105120 0.4709 -
4.6260 105130 0.4775 -
4.6264 105140 0.4856 -
4.6269 105150 0.5094 -
4.6273 105160 0.4857 -
4.6277 105170 0.4826 -
4.6282 105180 0.4755 -
4.6286 105190 0.478 -
4.6291 105200 0.4653 -
4.6295 105210 0.4846 -
4.6299 105220 0.495 -
4.6304 105230 0.4818 -
4.6308 105240 0.4774 -
4.6313 105250 0.4653 -
4.6317 105260 0.4831 -
4.6321 105270 0.4669 -
4.6326 105280 0.487 -
4.6330 105290 0.4782 -
4.6335 105300 0.4856 -
4.6339 105310 0.4788 -
4.6343 105320 0.4645 -
4.6348 105330 0.4584 -
4.6352 105340 0.4794 -
4.6357 105350 0.4689 -
4.6361 105360 0.4987 -
4.6365 105370 0.4593 -
4.6370 105380 0.4912 -
4.6374 105390 0.468 -
4.6379 105400 0.487 -
4.6383 105410 0.4889 -
4.6387 105420 0.4561 -
4.6392 105430 0.4759 -
4.6396 105440 0.4686 -
4.6401 105450 0.4885 -
4.6405 105460 0.4705 -
4.6409 105470 0.4763 -
4.6414 105480 0.4794 -
4.6418 105490 0.4922 -
4.6423 105500 0.4693 -
4.6427 105510 0.4923 -
4.6431 105520 0.4856 -
4.6436 105530 0.4796 -
4.6440 105540 0.4914 -
4.6445 105550 0.4501 -
4.6449 105560 0.4848 -
4.6453 105570 0.478 -
4.6458 105580 0.4637 -
4.6462 105590 0.4796 -
4.6467 105600 0.4826 -
4.6471 105610 0.4781 -
4.6475 105620 0.4882 -
4.6480 105630 0.4964 -
4.6484 105640 0.4779 -
4.6489 105650 0.4701 -
4.6493 105660 0.4673 -
4.6497 105670 0.5103 -
4.6502 105680 0.4795 -
4.6506 105690 0.489 -
4.6511 105700 0.4653 -
4.6515 105710 0.4607 -
4.6519 105720 0.468 -
4.6524 105730 0.4719 -
4.6528 105740 0.4784 -
4.6529 105741 - 1.2566
4.6533 105750 0.4967 -
4.6537 105760 0.4744 -
4.6541 105770 0.4645 -
4.6546 105780 0.4732 -
4.6550 105790 0.4869 -
4.6555 105800 0.463 -
4.6559 105810 0.5 -
4.6563 105820 0.4671 -
4.6568 105830 0.4734 -
4.6572 105840 0.4699 -
4.6577 105850 0.4864 -
4.6581 105860 0.5178 -
4.6585 105870 0.4782 -
4.6590 105880 0.4902 -
4.6594 105890 0.4823 -
4.6599 105900 0.4542 -
4.6603 105910 0.4609 -
4.6607 105920 0.4586 -
4.6612 105930 0.4864 -
4.6616 105940 0.479 -
4.6621 105950 0.4717 -
4.6625 105960 0.4938 -
4.6629 105970 0.4685 -
4.6634 105980 0.4705 -
4.6638 105990 0.4958 -
4.6643 106000 0.4722 -
4.6647 106010 0.4633 -
4.6651 106020 0.4877 -
4.6656 106030 0.4606 -
4.6660 106040 0.4797 -
4.6665 106050 0.4493 -
4.6669 106060 0.4745 -
4.6673 106070 0.4918 -
4.6678 106080 0.4966 -
4.6682 106090 0.4498 -
4.6687 106100 0.4965 -
4.6691 106110 0.4911 -
4.6695 106120 0.4907 -
4.6700 106130 0.4983 -
4.6704 106140 0.4665 -
4.6709 106150 0.4656 -
4.6713 106160 0.4967 -
4.6717 106170 0.4849 -
4.6722 106180 0.4895 -
4.6726 106190 0.5068 -
4.6731 106200 0.4711 -
4.6735 106210 0.4674 -
4.6739 106220 0.4659 -
4.6744 106230 0.4551 -
4.6748 106240 0.4449 -
4.6753 106250 0.4719 -
4.6757 106260 0.4872 -
4.6761 106270 0.4966 -
4.6766 106280 0.4792 -
4.6770 106290 0.4678 -
4.6775 106300 0.4731 -
4.6779 106310 0.4692 -
4.6783 106320 0.4766 -
4.6788 106330 0.4862 -
4.6792 106340 0.4784 -
4.6797 106350 0.4583 -
4.6801 106360 0.483 -
4.6805 106370 0.4846 -
4.6810 106380 0.4742 -
4.6814 106390 0.4573 -
4.6819 106400 0.4849 -
4.6823 106410 0.4731 -
4.6827 106420 0.4779 -
4.6832 106430 0.499 -
4.6836 106440 0.4798 -
4.6841 106450 0.4812 -
4.6845 106460 0.4946 -
4.6849 106470 0.4477 -
4.6854 106480 0.488 -
4.6858 106490 0.453 -
4.6863 106500 0.492 -
4.6867 106510 0.4665 -
4.6871 106520 0.478 -
4.6876 106530 0.4756 -
4.6880 106540 0.4766 -
4.6885 106550 0.4797 -
4.6889 106560 0.4539 -
4.6893 106570 0.4704 -
4.6898 106580 0.4763 -
4.6902 106590 0.4708 -
4.6907 106600 0.4594 -
4.6911 106610 0.477 -
4.6915 106620 0.471 -
4.6920 106630 0.4766 -
4.6924 106640 0.5066 -
4.6929 106650 0.5013 -
4.6933 106660 0.4733 -
4.6937 106670 0.4751 -
4.6942 106680 0.4794 -
4.6946 106690 0.4897 -
4.6951 106700 0.483 -
4.6955 106710 0.4732 -
4.6959 106720 0.4744 -
4.6964 106730 0.4627 -
4.6968 106740 0.4728 -
4.6973 106750 0.4698 -
4.6977 106760 0.4787 -
4.6981 106770 0.474 -
4.6986 106780 0.4667 -
4.6990 106790 0.4879 -
4.6995 106800 0.4994 -
4.6999 106810 0.4989 -
4.7003 106820 0.4592 -
4.7008 106830 0.4613 -
4.7012 106840 0.4904 -
4.7017 106850 0.4727 -
4.7021 106860 0.4681 -
4.7025 106870 0.4785 -
4.7029 106878 - 1.2603
4.7030 106880 0.4598 -
4.7034 106890 0.49 -
4.7039 106900 0.4809 -
4.7043 106910 0.5019 -
4.7047 106920 0.4417 -
4.7052 106930 0.4856 -
4.7056 106940 0.4656 -
4.7061 106950 0.5102 -
4.7065 106960 0.4836 -
4.7069 106970 0.4549 -
4.7074 106980 0.4767 -
4.7078 106990 0.4794 -
4.7083 107000 0.4979 -
4.7087 107010 0.4739 -
4.7091 107020 0.4941 -
4.7096 107030 0.4783 -
4.7100 107040 0.5039 -
4.7105 107050 0.4601 -
4.7109 107060 0.4761 -
4.7113 107070 0.4695 -
4.7118 107080 0.5134 -
4.7122 107090 0.4816 -
4.7127 107100 0.4791 -
4.7131 107110 0.4601 -
4.7135 107120 0.4884 -
4.7140 107130 0.4891 -
4.7144 107140 0.4559 -
4.7149 107150 0.4439 -
4.7153 107160 0.493 -
4.7157 107170 0.4851 -
4.7162 107180 0.4774 -
4.7166 107190 0.4638 -
4.7171 107200 0.4683 -
4.7175 107210 0.4733 -
4.7179 107220 0.4859 -
4.7184 107230 0.4867 -
4.7188 107240 0.4739 -
4.7193 107250 0.4948 -
4.7197 107260 0.4621 -
4.7201 107270 0.4627 -
4.7206 107280 0.498 -
4.7210 107290 0.4614 -
4.7215 107300 0.4561 -
4.7219 107310 0.4893 -
4.7223 107320 0.4621 -
4.7228 107330 0.4722 -
4.7232 107340 0.485 -
4.7237 107350 0.4628 -
4.7241 107360 0.4807 -
4.7245 107370 0.4798 -
4.7250 107380 0.4673 -
4.7254 107390 0.4703 -
4.7259 107400 0.4956 -
4.7263 107410 0.4715 -
4.7267 107420 0.4928 -
4.7272 107430 0.4854 -
4.7276 107440 0.4781 -
4.7281 107450 0.4906 -
4.7285 107460 0.491 -
4.7289 107470 0.4766 -
4.7294 107480 0.4745 -
4.7298 107490 0.4756 -
4.7303 107500 0.4839 -
4.7307 107510 0.4492 -
4.7311 107520 0.4579 -
4.7316 107530 0.4823 -
4.7320 107540 0.4514 -
4.7325 107550 0.4595 -
4.7329 107560 0.4898 -
4.7333 107570 0.4508 -
4.7338 107580 0.49 -
4.7342 107590 0.4475 -
4.7347 107600 0.4801 -
4.7351 107610 0.4665 -
4.7355 107620 0.4769 -
4.7360 107630 0.4827 -
4.7364 107640 0.4817 -
4.7369 107650 0.4608 -
4.7373 107660 0.4681 -
4.7377 107670 0.4681 -
4.7382 107680 0.5057 -
4.7386 107690 0.4849 -
4.7391 107700 0.4793 -
4.7395 107710 0.4935 -
4.7399 107720 0.4763 -
4.7404 107730 0.4774 -
4.7408 107740 0.4883 -
4.7413 107750 0.4613 -
4.7417 107760 0.4817 -
4.7421 107770 0.4721 -
4.7426 107780 0.4681 -
4.7430 107790 0.4818 -
4.7435 107800 0.4762 -
4.7439 107810 0.496 -
4.7443 107820 0.4865 -
4.7448 107830 0.4748 -
4.7452 107840 0.4525 -
4.7457 107850 0.4783 -
4.7461 107860 0.4754 -
4.7465 107870 0.4676 -
4.7470 107880 0.4811 -
4.7474 107890 0.4932 -
4.7479 107900 0.4764 -
4.7483 107910 0.4877 -
4.7487 107920 0.4709 -
4.7492 107930 0.4633 -
4.7496 107940 0.471 -
4.7501 107950 0.4692 -
4.7505 107960 0.4549 -
4.7509 107970 0.4778 -
4.7514 107980 0.4921 -
4.7518 107990 0.4801 -
4.7523 108000 0.4662 -
4.7527 108010 0.4852 -
4.7529 108015 - 1.2617
4.7531 108020 0.4915 -
4.7536 108030 0.472 -
4.7540 108040 0.4906 -
4.7545 108050 0.4817 -
4.7549 108060 0.4724 -
4.7553 108070 0.4696 -
4.7558 108080 0.4791 -
4.7562 108090 0.4819 -
4.7567 108100 0.4953 -
4.7571 108110 0.4665 -
4.7575 108120 0.4688 -
4.7580 108130 0.4791 -
4.7584 108140 0.4734 -
4.7589 108150 0.4828 -
4.7593 108160 0.4718 -
4.7597 108170 0.4813 -
4.7602 108180 0.4827 -
4.7606 108190 0.4993 -
4.7611 108200 0.4745 -
4.7615 108210 0.4777 -
4.7619 108220 0.4757 -
4.7624 108230 0.4799 -
4.7628 108240 0.4936 -
4.7633 108250 0.4893 -
4.7637 108260 0.464 -
4.7641 108270 0.4669 -
4.7646 108280 0.4921 -
4.7650 108290 0.4815 -
4.7655 108300 0.4836 -
4.7659 108310 0.4718 -
4.7663 108320 0.4574 -
4.7668 108330 0.4779 -
4.7672 108340 0.4849 -
4.7677 108350 0.4849 -
4.7681 108360 0.4601 -
4.7685 108370 0.4654 -
4.7690 108380 0.4704 -
4.7694 108390 0.4727 -
4.7699 108400 0.48 -
4.7703 108410 0.4726 -
4.7707 108420 0.4791 -
4.7712 108430 0.4519 -
4.7716 108440 0.4568 -
4.7721 108450 0.4833 -
4.7725 108460 0.476 -
4.7729 108470 0.4597 -
4.7734 108480 0.4745 -
4.7738 108490 0.4744 -
4.7743 108500 0.4601 -
4.7747 108510 0.4807 -
4.7751 108520 0.463 -
4.7756 108530 0.4761 -
4.7760 108540 0.4716 -
4.7765 108550 0.5068 -
4.7769 108560 0.4832 -
4.7773 108570 0.4641 -
4.7778 108580 0.466 -
4.7782 108590 0.4635 -
4.7787 108600 0.5043 -
4.7791 108610 0.4563 -
4.7795 108620 0.4998 -
4.7800 108630 0.5168 -
4.7804 108640 0.4806 -
4.7809 108650 0.4658 -
4.7813 108660 0.4594 -
4.7817 108670 0.4552 -
4.7822 108680 0.4604 -
4.7826 108690 0.4742 -
4.7831 108700 0.5057 -
4.7835 108710 0.4963 -
4.7839 108720 0.4626 -
4.7844 108730 0.4581 -
4.7848 108740 0.473 -
4.7853 108750 0.4914 -
4.7857 108760 0.4838 -
4.7861 108770 0.4643 -
4.7866 108780 0.5038 -
4.7870 108790 0.4858 -
4.7875 108800 0.4516 -
4.7879 108810 0.4685 -
4.7883 108820 0.4639 -
4.7888 108830 0.498 -
4.7892 108840 0.4752 -
4.7897 108850 0.475 -
4.7901 108860 0.4802 -
4.7905 108870 0.4624 -
4.7910 108880 0.4631 -
4.7914 108890 0.4598 -
4.7919 108900 0.4944 -
4.7923 108910 0.4857 -
4.7927 108920 0.4802 -
4.7932 108930 0.4788 -
4.7936 108940 0.473 -
4.7941 108950 0.4966 -
4.7945 108960 0.4845 -
4.7949 108970 0.4732 -
4.7954 108980 0.4749 -
4.7958 108990 0.4975 -
4.7963 109000 0.4812 -
4.7967 109010 0.4489 -
4.7971 109020 0.4791 -
4.7976 109030 0.4701 -
4.7980 109040 0.4691 -
4.7985 109050 0.4798 -
4.7989 109060 0.4769 -
4.7993 109070 0.4867 -
4.7998 109080 0.4873 -
4.8002 109090 0.4789 -
4.8007 109100 0.4458 -
4.8011 109110 0.4816 -
4.8015 109120 0.4718 -
4.8020 109130 0.4983 -
4.8024 109140 0.4901 -
4.8029 109150 0.4701 -
4.8030 109152 - 1.2595
4.8033 109160 0.4656 -
4.8037 109170 0.4845 -
4.8042 109180 0.4523 -
4.8046 109190 0.4638 -
4.8051 109200 0.4744 -
4.8055 109210 0.4916 -
4.8059 109220 0.4891 -
4.8064 109230 0.4787 -
4.8068 109240 0.4762 -
4.8073 109250 0.4643 -
4.8077 109260 0.4882 -
4.8081 109270 0.4844 -
4.8086 109280 0.4761 -
4.8090 109290 0.4708 -
4.8095 109300 0.4795 -
4.8099 109310 0.463 -
4.8103 109320 0.4636 -
4.8108 109330 0.4934 -
4.8112 109340 0.4787 -
4.8117 109350 0.4652 -
4.8121 109360 0.4929 -
4.8125 109370 0.4693 -
4.8130 109380 0.4949 -
4.8134 109390 0.461 -
4.8139 109400 0.4952 -
4.8143 109410 0.4669 -
4.8147 109420 0.4759 -
4.8152 109430 0.4672 -
4.8156 109440 0.4818 -
4.8161 109450 0.4953 -
4.8165 109460 0.4977 -
4.8169 109470 0.4703 -
4.8174 109480 0.5002 -
4.8178 109490 0.4674 -
4.8183 109500 0.4626 -
4.8187 109510 0.4886 -
4.8191 109520 0.4723 -
4.8196 109530 0.4569 -
4.8200 109540 0.4951 -
4.8205 109550 0.4666 -
4.8209 109560 0.5047 -
4.8213 109570 0.4802 -
4.8218 109580 0.4765 -
4.8222 109590 0.4736 -
4.8227 109600 0.4526 -
4.8231 109610 0.4594 -
4.8236 109620 0.4616 -
4.8240 109630 0.4674 -
4.8244 109640 0.4774 -
4.8249 109650 0.4834 -
4.8253 109660 0.4773 -
4.8258 109670 0.4797 -
4.8262 109680 0.4633 -
4.8266 109690 0.472 -
4.8271 109700 0.4755 -
4.8275 109710 0.4761 -
4.8280 109720 0.477 -
4.8284 109730 0.4787 -
4.8288 109740 0.4862 -
4.8293 109750 0.4916 -
4.8297 109760 0.4572 -
4.8302 109770 0.4859 -
4.8306 109780 0.4812 -
4.8310 109790 0.4703 -
4.8315 109800 0.4807 -
4.8319 109810 0.4731 -
4.8324 109820 0.4795 -
4.8328 109830 0.4696 -
4.8332 109840 0.4684 -
4.8337 109850 0.4581 -
4.8341 109860 0.4691 -
4.8346 109870 0.4829 -
4.8350 109880 0.4767 -
4.8354 109890 0.4666 -
4.8359 109900 0.4641 -
4.8363 109910 0.4903 -
4.8368 109920 0.4851 -
4.8372 109930 0.487 -
4.8376 109940 0.4702 -
4.8381 109950 0.4968 -
4.8385 109960 0.4829 -
4.8390 109970 0.4836 -
4.8394 109980 0.4687 -
4.8398 109990 0.4616 -
4.8403 110000 0.4854 -
4.8407 110010 0.4816 -
4.8412 110020 0.5018 -
4.8416 110030 0.4591 -
4.8420 110040 0.478 -
4.8425 110050 0.4653 -
4.8429 110060 0.4628 -
4.8434 110070 0.4778 -
4.8438 110080 0.4808 -
4.8442 110090 0.4861 -
4.8447 110100 0.4884 -
4.8451 110110 0.5016 -
4.8456 110120 0.4706 -
4.8460 110130 0.4716 -
4.8464 110140 0.4519 -
4.8469 110150 0.4949 -
4.8473 110160 0.4757 -
4.8478 110170 0.4853 -
4.8482 110180 0.4871 -
4.8486 110190 0.483 -
4.8491 110200 0.5004 -
4.8495 110210 0.4545 -
4.8500 110220 0.4985 -
4.8504 110230 0.4811 -
4.8508 110240 0.4669 -
4.8513 110250 0.4886 -
4.8517 110260 0.4671 -
4.8522 110270 0.4688 -
4.8526 110280 0.4595 -
4.8530 110289 - 1.2607
4.8530 110290 0.4727 -
4.8535 110300 0.4826 -
4.8539 110310 0.4985 -
4.8544 110320 0.468 -
4.8548 110330 0.4758 -
4.8552 110340 0.4481 -
4.8557 110350 0.5127 -
4.8561 110360 0.4721 -
4.8566 110370 0.4543 -
4.8570 110380 0.4938 -
4.8574 110390 0.4745 -
4.8579 110400 0.4813 -
4.8583 110410 0.4852 -
4.8588 110420 0.4821 -
4.8592 110430 0.4851 -
4.8596 110440 0.4755 -
4.8601 110450 0.4742 -
4.8605 110460 0.4787 -
4.8610 110470 0.4496 -
4.8614 110480 0.4763 -
4.8618 110490 0.4697 -
4.8623 110500 0.4676 -
4.8627 110510 0.4874 -
4.8632 110520 0.4859 -
4.8636 110530 0.4549 -
4.8640 110540 0.4642 -
4.8645 110550 0.466 -
4.8649 110560 0.4567 -
4.8654 110570 0.4777 -
4.8658 110580 0.4808 -
4.8662 110590 0.4755 -
4.8667 110600 0.4815 -
4.8671 110610 0.4656 -
4.8676 110620 0.4768 -
4.8680 110630 0.4512 -
4.8684 110640 0.4724 -
4.8689 110650 0.4534 -
4.8693 110660 0.4593 -
4.8698 110670 0.463 -
4.8702 110680 0.4827 -
4.8706 110690 0.4555 -
4.8711 110700 0.4857 -
4.8715 110710 0.4692 -
4.8720 110720 0.4678 -
4.8724 110730 0.4755 -
4.8728 110740 0.4581 -
4.8733 110750 0.4789 -
4.8737 110760 0.4793 -
4.8742 110770 0.4923 -
4.8746 110780 0.4734 -
4.8750 110790 0.4612 -
4.8755 110800 0.4912 -
4.8759 110810 0.4933 -
4.8764 110820 0.4737 -
4.8768 110830 0.467 -
4.8772 110840 0.4876 -
4.8777 110850 0.4837 -
4.8781 110860 0.473 -
4.8786 110870 0.4761 -
4.8790 110880 0.4913 -
4.8794 110890 0.4677 -
4.8799 110900 0.4844 -
4.8803 110910 0.4669 -
4.8808 110920 0.475 -
4.8812 110930 0.4778 -
4.8816 110940 0.4815 -
4.8821 110950 0.4918 -
4.8825 110960 0.4707 -
4.8830 110970 0.4741 -
4.8834 110980 0.5028 -
4.8838 110990 0.4735 -
4.8843 111000 0.4973 -
4.8847 111010 0.4673 -
4.8852 111020 0.4816 -
4.8856 111030 0.4584 -
4.8860 111040 0.453 -
4.8865 111050 0.4699 -
4.8869 111060 0.4641 -
4.8874 111070 0.4587 -
4.8878 111080 0.4828 -
4.8882 111090 0.4686 -
4.8887 111100 0.4742 -
4.8891 111110 0.4558 -
4.8896 111120 0.4988 -
4.8900 111130 0.4864 -
4.8904 111140 0.4722 -
4.8909 111150 0.4494 -
4.8913 111160 0.4726 -
4.8918 111170 0.4531 -
4.8922 111180 0.4882 -
4.8926 111190 0.4575 -
4.8931 111200 0.4703 -
4.8935 111210 0.4643 -
4.8940 111220 0.4827 -
4.8944 111230 0.4711 -
4.8948 111240 0.4589 -
4.8953 111250 0.485 -
4.8957 111260 0.4804 -
4.8962 111270 0.4439 -
4.8966 111280 0.4743 -
4.8970 111290 0.4799 -
4.8975 111300 0.4653 -
4.8979 111310 0.4941 -
4.8984 111320 0.4618 -
4.8988 111330 0.4753 -
4.8992 111340 0.484 -
4.8997 111350 0.4785 -
4.9001 111360 0.4871 -
4.9006 111370 0.4626 -
4.9010 111380 0.4943 -
4.9014 111390 0.4885 -
4.9019 111400 0.4798 -
4.9023 111410 0.4837 -
4.9028 111420 0.4733 -
4.9030 111426 - 1.2603
4.9032 111430 0.4807 -
4.9036 111440 0.4902 -
4.9041 111450 0.4677 -
4.9045 111460 0.4815 -
4.9050 111470 0.4674 -
4.9054 111480 0.4878 -
4.9058 111490 0.4574 -
4.9063 111500 0.4699 -
4.9067 111510 0.484 -
4.9072 111520 0.4876 -
4.9076 111530 0.4758 -
4.9080 111540 0.458 -
4.9085 111550 0.4681 -
4.9089 111560 0.4815 -
4.9094 111570 0.4676 -
4.9098 111580 0.4651 -
4.9102 111590 0.4532 -
4.9107 111600 0.48 -
4.9111 111610 0.4988 -
4.9116 111620 0.4623 -
4.9120 111630 0.4868 -
4.9124 111640 0.4718 -
4.9129 111650 0.4846 -
4.9133 111660 0.4547 -
4.9138 111670 0.491 -
4.9142 111680 0.4834 -
4.9146 111690 0.4864 -
4.9151 111700 0.4706 -
4.9155 111710 0.4732 -
4.9160 111720 0.4575 -
4.9164 111730 0.4761 -
4.9168 111740 0.4848 -
4.9173 111750 0.4748 -
4.9177 111760 0.4873 -
4.9182 111770 0.4561 -
4.9186 111780 0.4928 -
4.9190 111790 0.4813 -
4.9195 111800 0.4766 -
4.9199 111810 0.4764 -
4.9204 111820 0.4423 -
4.9208 111830 0.4877 -
4.9212 111840 0.4587 -
4.9217 111850 0.4941 -
4.9221 111860 0.4841 -
4.9226 111870 0.4725 -
4.9230 111880 0.501 -
4.9234 111890 0.4562 -
4.9239 111900 0.4752 -
4.9243 111910 0.4876 -
4.9248 111920 0.4877 -
4.9252 111930 0.4803 -
4.9256 111940 0.4617 -
4.9261 111950 0.4801 -
4.9265 111960 0.4807 -
4.9270 111970 0.4769 -
4.9274 111980 0.4793 -
4.9278 111990 0.4845 -
4.9283 112000 0.4903 -
4.9287 112010 0.4665 -
4.9292 112020 0.4654 -
4.9296 112030 0.4741 -
4.9300 112040 0.4635 -
4.9305 112050 0.4757 -
4.9309 112060 0.5063 -
4.9314 112070 0.4591 -
4.9318 112080 0.4725 -
4.9322 112090 0.4821 -
4.9327 112100 0.4732 -
4.9331 112110 0.4484 -
4.9336 112120 0.4517 -
4.9340 112130 0.4764 -
4.9344 112140 0.494 -
4.9349 112150 0.492 -
4.9353 112160 0.4605 -
4.9358 112170 0.4682 -
4.9362 112180 0.4846 -
4.9366 112190 0.4966 -
4.9371 112200 0.4566 -
4.9375 112210 0.4569 -
4.9380 112220 0.4731 -
4.9384 112230 0.4659 -
4.9388 112240 0.4594 -
4.9393 112250 0.4599 -
4.9397 112260 0.4643 -
4.9402 112270 0.482 -
4.9406 112280 0.4489 -
4.9410 112290 0.4976 -
4.9415 112300 0.458 -
4.9419 112310 0.473 -
4.9424 112320 0.4799 -
4.9428 112330 0.4821 -
4.9432 112340 0.4704 -
4.9437 112350 0.4603 -
4.9441 112360 0.4751 -
4.9446 112370 0.5101 -
4.9450 112380 0.4974 -
4.9454 112390 0.4672 -
4.9459 112400 0.4812 -
4.9463 112410 0.4882 -
4.9468 112420 0.4735 -
4.9472 112430 0.4812 -
4.9476 112440 0.458 -
4.9481 112450 0.4874 -
4.9485 112460 0.4535 -
4.9490 112470 0.4811 -
4.9494 112480 0.4795 -
4.9498 112490 0.4994 -
4.9503 112500 0.4498 -
4.9507 112510 0.4672 -
4.9512 112520 0.4861 -
4.9516 112530 0.464 -
4.9520 112540 0.4611 -
4.9525 112550 0.4804 -
4.9529 112560 0.4979 -
4.9530 112563 - 1.2611
4.9534 112570 0.4769 -
4.9538 112580 0.4854 -
4.9542 112590 0.4864 -
4.9547 112600 0.5016 -
4.9551 112610 0.4948 -
4.9556 112620 0.4697 -
4.9560 112630 0.4512 -
4.9564 112640 0.4635 -
4.9569 112650 0.4336 -
4.9573 112660 0.4716 -
4.9578 112670 0.4724 -
4.9582 112680 0.4628 -
4.9586 112690 0.4722 -
4.9591 112700 0.4689 -
4.9595 112710 0.4758 -
4.9600 112720 0.4934 -
4.9604 112730 0.4693 -
4.9608 112740 0.4702 -
4.9613 112750 0.4794 -
4.9617 112760 0.4855 -
4.9622 112770 0.4635 -
4.9626 112780 0.4706 -
4.9630 112790 0.4563 -
4.9635 112800 0.4573 -
4.9639 112810 0.4581 -
4.9644 112820 0.4784 -
4.9648 112830 0.4882 -
4.9652 112840 0.4754 -
4.9657 112850 0.4775 -
4.9661 112860 0.4808 -
4.9666 112870 0.4691 -
4.9670 112880 0.4911 -
4.9674 112890 0.4681 -
4.9679 112900 0.4825 -
4.9683 112910 0.4467 -
4.9688 112920 0.4733 -
4.9692 112930 0.4825 -
4.9696 112940 0.49 -
4.9701 112950 0.4584 -
4.9705 112960 0.4849 -
4.9710 112970 0.5077 -
4.9714 112980 0.462 -
4.9718 112990 0.4823 -
4.9723 113000 0.4838 -
4.9727 113010 0.4538 -
4.9732 113020 0.4812 -
4.9736 113030 0.4525 -
4.9740 113040 0.467 -
4.9745 113050 0.4642 -
4.9749 113060 0.4625 -
4.9754 113070 0.4775 -
4.9758 113080 0.4823 -
4.9762 113090 0.4663 -
4.9767 113100 0.4813 -
4.9771 113110 0.4687 -
4.9776 113120 0.5004 -
4.9780 113130 0.4938 -
4.9784 113140 0.4819 -
4.9789 113150 0.4665 -
4.9793 113160 0.4539 -
4.9798 113170 0.4368 -
4.9802 113180 0.4844 -
4.9806 113190 0.5041 -
4.9811 113200 0.4905 -
4.9815 113210 0.4775 -
4.9820 113220 0.4724 -
4.9824 113230 0.4744 -
4.9828 113240 0.4745 -
4.9833 113250 0.4641 -
4.9837 113260 0.4567 -
4.9842 113270 0.4705 -
4.9846 113280 0.4556 -
4.9850 113290 0.4655 -
4.9855 113300 0.4724 -
4.9859 113310 0.48 -
4.9864 113320 0.4555 -
4.9868 113330 0.4755 -
4.9872 113340 0.497 -
4.9877 113350 0.467 -
4.9881 113360 0.4767 -
4.9886 113370 0.4862 -
4.9890 113380 0.4905 -
4.9894 113390 0.4795 -
4.9899 113400 0.461 -
4.9903 113410 0.486 -
4.9908 113420 0.4861 -
4.9912 113430 0.4627 -
4.9916 113440 0.4692 -
4.9921 113450 0.4798 -
4.9925 113460 0.4725 -
4.9930 113470 0.4719 -
4.9934 113480 0.4837 -
4.9938 113490 0.4652 -
4.9943 113500 0.4634 -
4.9947 113510 0.4617 -
4.9952 113520 0.459 -
4.9956 113530 0.4685 -
4.9960 113540 0.4902 -
4.9965 113550 0.4713 -
4.9969 113560 0.4819 -
4.9974 113570 0.4578 -
4.9978 113580 0.4712 -
4.9982 113590 0.4552 -
4.9987 113600 0.4529 -
4.9991 113610 0.467 -
4.9996 113620 0.4618 -
5.0 113630 0.4417 -

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.5.1.post302
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
84
Safetensors
Model size
82.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hanwenzhu/all-distilroberta-v1-lr2e-4-bs256-nneg3-ml-ne5-mar17

Finetuned
(27)
this model