SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-mlbs-mar03")
# Run inference
sentences = [
    'Mathlib.Algebra.Polynomial.HasseDeriv#31',
    'Polynomial.hasseDeriv_coeff',
    'HomologicalComplex.isZero_X_of_isStrictlySupported',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,854,451 training samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 10 tokens
    • mean: 17.28 tokens
    • max: 22 tokens
    • min: 3 tokens
    • mean: 11.34 tokens
    • max: 38 tokens
  • Samples:
    state_name premise_name
    Mathlib.RingTheory.Ideal.Norm.RelNorm#46 RingHomCompTriple.ids
    Mathlib.RingTheory.Ideal.Norm.RelNorm#46 MonoidWithZeroHomClass.toMonoidHomClass
    Mathlib.RingTheory.Ideal.Norm.RelNorm#46 Ideal.subset_span
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,959 evaluation samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 10 tokens
    • mean: 17.08 tokens
    • max: 24 tokens
    • min: 5 tokens
    • mean: 11.05 tokens
    • max: 31 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Algebra.Hom#80 AlgHom.commutes
    Mathlib.Algebra.Algebra.NonUnitalSubalgebra#237 NonUnitalAlgHom.instNonUnitalAlgSemiHomClass
    Mathlib.Algebra.Algebra.NonUnitalSubalgebra#237 NonUnitalAlgebra.mem_top
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0002
  • num_train_epochs: 1.0
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • bf16: True
  • dataloader_num_workers: 4

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0017 10 7.5842 -
0.0035 20 6.4567 -
0.0052 30 5.9408 -
0.0070 40 5.7176 -
0.0087 50 5.5353 -
0.0101 58 - 2.2337
0.0105 60 5.4044 -
0.0122 70 5.3384 -
0.0140 80 5.2395 -
0.0157 90 5.1291 -
0.0175 100 5.1093 -
0.0192 110 5.0695 -
0.0203 116 - 1.9949
0.0210 120 4.9664 -
0.0227 130 4.973 -
0.0245 140 4.9065 -
0.0262 150 4.8961 -
0.0280 160 4.839 -
0.0297 170 4.8513 -
0.0304 174 - 1.9119
0.0315 180 4.7662 -
0.0332 190 4.7385 -
0.0350 200 4.7036 -
0.0367 210 4.7013 -
0.0385 220 4.6837 -
0.0402 230 4.6325 -
0.0406 232 - 1.7502
0.0420 240 4.5982 -
0.0437 250 4.5526 -
0.0455 260 4.5793 -
0.0472 270 4.545 -
0.0490 280 4.5173 -
0.0507 290 4.4595 1.6955
0.0525 300 4.4772 -
0.0542 310 4.4038 -
0.0560 320 4.4132 -
0.0577 330 4.4139 -
0.0595 340 4.3585 -
0.0609 348 - 1.6316
0.0612 350 4.3314 -
0.0630 360 4.3805 -
0.0647 370 4.2791 -
0.0665 380 4.2938 -
0.0682 390 4.2591 -
0.0700 400 4.262 -
0.0710 406 - 1.5723
0.0717 410 4.2108 -
0.0735 420 4.1723 -
0.0752 430 4.157 -
0.0769 440 4.1878 -
0.0787 450 4.1644 -
0.0804 460 4.1569 -
0.0811 464 - 1.5368
0.0822 470 4.139 -
0.0839 480 4.0872 -
0.0857 490 4.1169 -
0.0874 500 4.062 -
0.0892 510 4.1138 -
0.0909 520 4.1088 -
0.0913 522 - 1.5232
0.0927 530 4.0526 -
0.0944 540 4.0355 -
0.0962 550 3.9937 -
0.0979 560 3.9647 -
0.0997 570 3.9715 -
0.1014 580 3.9524 1.4901
0.1032 590 3.945 -
0.1049 600 3.9615 -
0.1067 610 3.9713 -
0.1084 620 3.9264 -
0.1102 630 3.9036 -
0.1116 638 - 1.4411
0.1119 640 3.8909 -
0.1137 650 3.901 -
0.1154 660 3.879 -
0.1172 670 3.8696 -
0.1189 680 3.8678 -
0.1207 690 3.8472 -
0.1217 696 - 1.4459
0.1224 700 3.8277 -
0.1242 710 3.8321 -
0.1259 720 3.812 -
0.1277 730 3.8386 -
0.1294 740 3.7583 -
0.1312 750 3.8007 -
0.1319 754 - 1.3644
0.1329 760 3.7337 -
0.1347 770 3.7554 -
0.1364 780 3.7518 -
0.1382 790 3.6993 -
0.1399 800 3.7477 -
0.1417 810 3.6979 -
0.1420 812 - 1.3702
0.1434 820 3.6651 -
0.1452 830 3.7292 -
0.1469 840 3.7005 -
0.1487 850 3.6856 -
0.1504 860 3.631 -
0.1522 870 3.6459 1.3568
0.1539 880 3.6089 -
0.1556 890 3.6134 -
0.1574 900 3.6058 -
0.1591 910 3.6193 -
0.1609 920 3.627 -
0.1623 928 - 1.3072
0.1626 930 3.6202 -
0.1644 940 3.5891 -
0.1661 950 3.6185 -
0.1679 960 3.5984 -
0.1696 970 3.6258 -
0.1714 980 3.5625 -
0.1724 986 - 1.2930
0.1731 990 3.5441 -
0.1749 1000 3.5571 -
0.1766 1010 3.5486 -
0.1784 1020 3.5382 -
0.1801 1030 3.4519 -
0.1819 1040 3.5072 -
0.1826 1044 - 1.2823
0.1836 1050 3.5042 -
0.1854 1060 3.5005 -
0.1871 1070 3.455 -
0.1889 1080 3.4727 -
0.1906 1090 3.4473 -
0.1924 1100 3.4296 -
0.1927 1102 - 1.2696
0.1941 1110 3.449 -
0.1959 1120 3.4202 -
0.1976 1130 3.4236 -
0.1994 1140 3.414 -
0.2011 1150 3.4264 -
0.2029 1160 3.4005 1.2602
0.2046 1170 3.3801 -
0.2064 1180 3.3543 -
0.2081 1190 3.3866 -
0.2099 1200 3.3831 -
0.2116 1210 3.3691 -
0.2130 1218 - 1.2130
0.2134 1220 3.3607 -
0.2151 1230 3.3659 -
0.2169 1240 3.3538 -
0.2186 1250 3.3336 -
0.2204 1260 3.3403 -
0.2221 1270 3.3062 -
0.2232 1276 - 1.2237
0.2239 1280 3.3251 -
0.2256 1290 3.3475 -
0.2274 1300 3.2729 -
0.2291 1310 3.2872 -
0.2308 1320 3.2778 -
0.2326 1330 3.3147 -
0.2333 1334 - 1.2061
0.2343 1340 3.2477 -
0.2361 1350 3.2871 -
0.2378 1360 3.2458 -
0.2396 1370 3.279 -
0.2413 1380 3.2546 -
0.2431 1390 3.2342 -
0.2434 1392 - 1.1854
0.2448 1400 3.2488 -
0.2466 1410 3.2489 -
0.2483 1420 3.2368 -
0.2501 1430 3.2517 -
0.2518 1440 3.2568 -
0.2536 1450 3.21 1.1616
0.2553 1460 3.1891 -
0.2571 1470 3.1739 -
0.2588 1480 3.2004 -
0.2606 1490 3.1988 -
0.2623 1500 3.1892 -
0.2637 1508 - 1.1306
0.2641 1510 3.1967 -
0.2658 1520 3.1331 -
0.2676 1530 3.155 -
0.2693 1540 3.1564 -
0.2711 1550 3.1912 -
0.2728 1560 3.1005 -
0.2739 1566 - 1.1026
0.2746 1570 3.1166 -
0.2763 1580 3.1453 -
0.2781 1590 3.116 -
0.2798 1600 3.1465 -
0.2816 1610 3.1325 -
0.2833 1620 3.1022 -
0.2840 1624 - 1.1400
0.2851 1630 3.0703 -
0.2868 1640 3.0999 -
0.2886 1650 3.0957 -
0.2903 1660 3.0886 -
0.2921 1670 3.0471 -
0.2938 1680 3.0845 -
0.2942 1682 - 1.1045
0.2956 1690 3.0513 -
0.2973 1700 3.0621 -
0.2991 1710 3.0473 -
0.3008 1720 3.0486 -
0.3026 1730 3.0189 -
0.3043 1740 3.0675 1.1004
0.3061 1750 3.0592 -
0.3078 1760 3.0663 -
0.3095 1770 3.0879 -
0.3113 1780 3.0167 -
0.3130 1790 3.0356 -
0.3144 1798 - 1.0554
0.3148 1800 3.0294 -
0.3165 1810 2.9956 -
0.3183 1820 2.985 -
0.3200 1830 2.9824 -
0.3218 1840 2.9939 -
0.3235 1850 2.9979 -
0.3246 1856 - 1.0561
0.3253 1860 2.9935 -
0.3270 1870 3.0613 -
0.3288 1880 2.9742 -
0.3305 1890 2.9858 -
0.3323 1900 2.9446 -
0.3340 1910 2.9571 -
0.3347 1914 - 1.0333
0.3358 1920 2.9839 -
0.3375 1930 2.9865 -
0.3393 1940 2.9398 -
0.3410 1950 2.9504 -
0.3428 1960 2.9371 -
0.3445 1970 2.9222 -
0.3449 1972 - 1.0322
0.3463 1980 2.8907 -
0.3480 1990 2.9412 -
0.3498 2000 2.944 -
0.3515 2010 2.9168 -
0.3533 2020 2.9076 -
0.3550 2030 2.8967 1.0103
0.3568 2040 2.8569 -
0.3585 2050 2.8602 -
0.3603 2060 2.8984 -
0.3620 2070 2.8782 -
0.3638 2080 2.8649 -
0.3652 2088 - 1.0136
0.3655 2090 2.8388 -
0.3673 2100 2.8845 -
0.3690 2110 2.8749 -
0.3708 2120 2.8439 -
0.3725 2130 2.8693 -
0.3743 2140 2.8342 -
0.3753 2146 - 0.9949
0.3760 2150 2.8696 -
0.3778 2160 2.872 -
0.3795 2170 2.828 -
0.3813 2180 2.8338 -
0.3830 2190 2.8716 -
0.3847 2200 2.8798 -
0.3854 2204 - 1.0067
0.3865 2210 2.834 -
0.3882 2220 2.7885 -
0.3900 2230 2.8152 -
0.3917 2240 2.8214 -
0.3935 2250 2.8306 -
0.3952 2260 2.8164 -
0.3956 2262 - 0.9845
0.3970 2270 2.8338 -
0.3987 2280 2.8223 -
0.4005 2290 2.8183 -
0.4022 2300 2.7903 -
0.4040 2310 2.7772 -
0.4057 2320 2.7952 0.9900
0.4075 2330 2.7733 -
0.4092 2340 2.8096 -
0.4110 2350 2.771 -
0.4127 2360 2.8178 -
0.4145 2370 2.7539 -
0.4159 2378 - 0.9749
0.4162 2380 2.7488 -
0.4180 2390 2.7592 -
0.4197 2400 2.7385 -
0.4215 2410 2.7564 -
0.4232 2420 2.7573 -
0.4250 2430 2.7686 -
0.4260 2436 - 0.9509
0.4267 2440 2.7147 -
0.4285 2450 2.7375 -
0.4302 2460 2.6995 -
0.4320 2470 2.6888 -
0.4337 2480 2.7171 -
0.4355 2490 2.712 -
0.4362 2494 - 0.9311
0.4372 2500 2.729 -
0.4390 2510 2.6974 -
0.4407 2520 2.7056 -
0.4425 2530 2.7123 -
0.4442 2540 2.701 -
0.4460 2550 2.7211 -
0.4463 2552 - 0.9259
0.4477 2560 2.6974 -
0.4495 2570 2.6823 -
0.4512 2580 2.6968 -
0.4530 2590 2.7126 -
0.4547 2600 2.693 -
0.4565 2610 2.7164 0.9161
0.4582 2620 2.6558 -
0.4600 2630 2.6972 -
0.4617 2640 2.7116 -
0.4634 2650 2.6398 -
0.4652 2660 2.6645 -
0.4666 2668 - 0.8982
0.4669 2670 2.6646 -
0.4687 2680 2.6828 -
0.4704 2690 2.6502 -
0.4722 2700 2.6605 -
0.4739 2710 2.6224 -
0.4757 2720 2.6753 -
0.4767 2726 - 0.8941
0.4774 2730 2.6478 -
0.4792 2740 2.6688 -
0.4809 2750 2.6674 -
0.4827 2760 2.6132 -
0.4844 2770 2.6286 -
0.4862 2780 2.634 -
0.4869 2784 - 0.8756
0.4879 2790 2.6359 -
0.4897 2800 2.6242 -
0.4914 2810 2.6443 -
0.4932 2820 2.59 -
0.4949 2830 2.6166 -
0.4967 2840 2.6249 -
0.4970 2842 - 0.8802
0.4984 2850 2.6257 -
0.5002 2860 2.6286 -
0.5019 2870 2.5671 -
0.5037 2880 2.5959 -
0.5054 2890 2.5962 -
0.5072 2900 2.5521 0.8673
0.5089 2910 2.5833 -
0.5107 2920 2.6015 -
0.5124 2930 2.6446 -
0.5142 2940 2.5655 -
0.5159 2950 2.5802 -
0.5173 2958 - 0.8614
0.5177 2960 2.6124 -
0.5194 2970 2.5372 -
0.5212 2980 2.5108 -
0.5229 2990 2.578 -
0.5247 3000 2.5629 -
0.5264 3010 2.5691 -
0.5275 3016 - 0.8418
0.5282 3020 2.5313 -
0.5299 3030 2.5791 -
0.5317 3040 2.5216 -
0.5334 3050 2.5263 -
0.5352 3060 2.5213 -
0.5369 3070 2.5485 -
0.5376 3074 - 0.8546
0.5386 3080 2.5435 -
0.5404 3090 2.5599 -
0.5421 3100 2.5045 -
0.5439 3110 2.5055 -
0.5456 3120 2.54 -
0.5474 3130 2.5134 -
0.5477 3132 - 0.8515
0.5491 3140 2.5053 -
0.5509 3150 2.4578 -
0.5526 3160 2.517 -
0.5544 3170 2.5061 -
0.5561 3180 2.5262 -
0.5579 3190 2.5787 0.8376
0.5596 3200 2.4855 -
0.5614 3210 2.5058 -
0.5631 3220 2.5279 -
0.5649 3230 2.498 -
0.5666 3240 2.5045 -
0.5680 3248 - 0.8407
0.5684 3250 2.5129 -
0.5701 3260 2.517 -
0.5719 3270 2.4647 -
0.5736 3280 2.4642 -
0.5754 3290 2.4936 -
0.5771 3300 2.4862 -
0.5782 3306 - 0.8310
0.5789 3310 2.4805 -
0.5806 3320 2.4986 -
0.5824 3330 2.481 -
0.5841 3340 2.4747 -
0.5859 3350 2.4939 -
0.5876 3360 2.4691 -
0.5883 3364 - 0.8397
0.5894 3370 2.4798 -
0.5911 3380 2.4439 -
0.5929 3390 2.4849 -
0.5946 3400 2.4653 -
0.5964 3410 2.4795 -
0.5981 3420 2.4681 -
0.5985 3422 - 0.8265
0.5999 3430 2.4671 -
0.6016 3440 2.4579 -
0.6034 3450 2.4319 -
0.6051 3460 2.4235 -
0.6069 3470 2.4447 -
0.6086 3480 2.456 0.8104
0.6104 3490 2.4107 -
0.6121 3500 2.49 -
0.6139 3510 2.4511 -
0.6156 3520 2.4446 -
0.6173 3530 2.4159 -
0.6187 3538 - 0.8086
0.6191 3540 2.4135 -
0.6208 3550 2.4147 -
0.6226 3560 2.4458 -
0.6243 3570 2.4207 -
0.6261 3580 2.4333 -
0.6278 3590 2.3931 -
0.6289 3596 - 0.8036
0.6296 3600 2.4695 -
0.6313 3610 2.4285 -
0.6331 3620 2.4066 -
0.6348 3630 2.414 -
0.6366 3640 2.4229 -
0.6383 3650 2.3916 -
0.6390 3654 - 0.7960
0.6401 3660 2.4376 -
0.6418 3670 2.4196 -
0.6436 3680 2.4132 -
0.6453 3690 2.4016 -
0.6471 3700 2.3749 -
0.6488 3710 2.3963 -
0.6492 3712 - 0.7895
0.6506 3720 2.4223 -
0.6523 3730 2.3787 -
0.6541 3740 2.368 -
0.6558 3750 2.3526 -
0.6576 3760 2.3883 -
0.6593 3770 2.4286 0.7897
0.6611 3780 2.366 -
0.6628 3790 2.3914 -
0.6646 3800 2.416 -
0.6663 3810 2.3731 -
0.6681 3820 2.4097 -
0.6695 3828 - 0.7782
0.6698 3830 2.374 -
0.6716 3840 2.3591 -
0.6733 3850 2.384 -
0.6751 3860 2.398 -
0.6768 3870 2.3712 -
0.6786 3880 2.3936 -
0.6796 3886 - 0.7725
0.6803 3890 2.3895 -
0.6821 3900 2.359 -
0.6838 3910 2.3901 -
0.6856 3920 2.4 -
0.6873 3930 2.3628 -
0.6891 3940 2.3732 -
0.6898 3944 - 0.7658
0.6908 3950 2.3929 -
0.6925 3960 2.3792 -
0.6943 3970 2.3496 -
0.6960 3980 2.3242 -
0.6978 3990 2.3471 -
0.6995 4000 2.3503 -
0.6999 4002 - 0.7617
0.7013 4010 2.3693 -
0.7030 4020 2.3608 -
0.7048 4030 2.3419 -
0.7065 4040 2.3577 -
0.7083 4050 2.3403 -
0.7100 4060 2.3491 0.7549
0.7118 4070 2.3175 -
0.7135 4080 2.3513 -
0.7153 4090 2.3767 -
0.7170 4100 2.371 -
0.7188 4110 2.3103 -
0.7202 4118 - 0.7585
0.7205 4120 2.3048 -
0.7223 4130 2.3406 -
0.7240 4140 2.3551 -
0.7258 4150 2.3309 -
0.7275 4160 2.3565 -
0.7293 4170 2.3111 -
0.7303 4176 - 0.7527
0.7310 4180 2.2925 -
0.7328 4190 2.281 -
0.7345 4200 2.3131 -
0.7363 4210 2.3568 -
0.7380 4220 2.3645 -
0.7398 4230 2.3283 -
0.7405 4234 - 0.7497
0.7415 4240 2.3098 -
0.7433 4250 2.3136 -
0.7450 4260 2.3141 -
0.7468 4270 2.2717 -
0.7485 4280 2.325 -
0.7503 4290 2.3358 -
0.7506 4292 - 0.7449
0.7520 4300 2.296 -
0.7538 4310 2.3211 -
0.7555 4320 2.3035 -
0.7573 4330 2.3114 -
0.7590 4340 2.3076 -
0.7608 4350 2.334 0.7416
0.7625 4360 2.2805 -
0.7643 4370 2.3302 -
0.7660 4380 2.2753 -
0.7678 4390 2.3084 -
0.7695 4400 2.308 -
0.7709 4408 - 0.7463
0.7712 4410 2.2909 -
0.7730 4420 2.2796 -
0.7747 4430 2.2868 -
0.7765 4440 2.3021 -
0.7782 4450 2.2977 -
0.7800 4460 2.2885 -
0.7810 4466 - 0.7391
0.7817 4470 2.2967 -
0.7835 4480 2.2774 -
0.7852 4490 2.3178 -
0.7870 4500 2.2785 -
0.7887 4510 2.2493 -
0.7905 4520 2.2866 -
0.7912 4524 - 0.7325
0.7922 4530 2.2632 -
0.7940 4540 2.289 -
0.7957 4550 2.2782 -
0.7975 4560 2.2607 -
0.7992 4570 2.2914 -
0.8010 4580 2.2593 -
0.8013 4582 - 0.7318
0.8027 4590 2.3077 -
0.8045 4600 2.2793 -
0.8062 4610 2.3051 -
0.8080 4620 2.2914 -
0.8097 4630 2.2646 -
0.8115 4640 2.2574 0.7308
0.8132 4650 2.2654 -
0.8150 4660 2.235 -
0.8167 4670 2.258 -
0.8185 4680 2.2935 -
0.8202 4690 2.281 -
0.8216 4698 - 0.7281
0.8220 4700 2.295 -
0.8237 4710 2.3095 -
0.8255 4720 2.2516 -
0.8272 4730 2.2292 -
0.8290 4740 2.2635 -
0.8307 4750 2.2522 -
0.8318 4756 - 0.7330
0.8325 4760 2.248 -
0.8342 4770 2.3082 -
0.8360 4780 2.2447 -
0.8377 4790 2.2596 -
0.8395 4800 2.2747 -
0.8412 4810 2.2343 -
0.8419 4814 - 0.7319
0.8430 4820 2.2521 -
0.8447 4830 2.2642 -
0.8464 4840 2.2492 -
0.8482 4850 2.2788 -
0.8499 4860 2.2925 -
0.8517 4870 2.2491 -
0.8520 4872 - 0.7304
0.8534 4880 2.2666 -
0.8552 4890 2.2261 -
0.8569 4900 2.2504 -
0.8587 4910 2.2567 -
0.8604 4920 2.2813 -
0.8622 4930 2.244 0.7277
0.8639 4940 2.2645 -
0.8657 4950 2.228 -
0.8674 4960 2.2322 -
0.8692 4970 2.2547 -
0.8709 4980 2.2722 -
0.8723 4988 - 0.7272
0.8727 4990 2.227 -
0.8744 5000 2.2407 -
0.8762 5010 2.2269 -
0.8779 5020 2.2428 -
0.8797 5030 2.2448 -
0.8814 5040 2.2562 -
0.8825 5046 - 0.7256
0.8832 5050 2.2364 -
0.8849 5060 2.2445 -
0.8867 5070 2.2409 -
0.8884 5080 2.2261 -
0.8902 5090 2.2613 -
0.8919 5100 2.2718 -
0.8926 5104 - 0.7233
0.8937 5110 2.2544 -
0.8954 5120 2.2276 -
0.8972 5130 2.2385 -
0.8989 5140 2.2401 -
0.9007 5150 2.2769 -
0.9024 5160 2.2399 -
0.9028 5162 - 0.7231
0.9042 5170 2.2205 -
0.9059 5180 2.2303 -
0.9077 5190 2.231 -
0.9094 5200 2.2356 -
0.9112 5210 2.2386 -
0.9129 5220 2.2233 0.7233
0.9147 5230 2.2509 -
0.9164 5240 2.2201 -
0.9182 5250 2.2189 -
0.9199 5260 2.1992 -
0.9217 5270 2.2362 -
0.9231 5278 - 0.7221
0.9234 5280 2.2293 -
0.9251 5290 2.2302 -
0.9269 5300 2.2216 -
0.9286 5310 2.2191 -
0.9304 5320 2.2504 -
0.9321 5330 2.2447 -
0.9332 5336 - 0.7221
0.9339 5340 2.2326 -
0.9356 5350 2.2315 -
0.9374 5360 2.244 -
0.9391 5370 2.2369 -
0.9409 5380 2.2312 -
0.9426 5390 2.2739 -
0.9433 5394 - 0.7206
0.9444 5400 2.2598 -
0.9461 5410 2.2319 -
0.9479 5420 2.2312 -
0.9496 5430 2.2592 -
0.9514 5440 2.2503 -
0.9531 5450 2.232 -
0.9535 5452 - 0.7208
0.9549 5460 2.2341 -
0.9566 5470 2.2564 -
0.9584 5480 2.2087 -
0.9601 5490 2.257 -
0.9619 5500 2.2524 -
0.9636 5510 2.253 0.7204
0.9654 5520 2.2424 -
0.9671 5530 2.2459 -
0.9689 5540 2.2387 -
0.9706 5550 2.2482 -
0.9724 5560 2.2156 -
0.9738 5568 - 0.7200
0.9741 5570 2.2343 -
0.9759 5580 2.2426 -
0.9776 5590 2.2154 -
0.9794 5600 2.2365 -
0.9811 5610 2.275 -
0.9829 5620 2.2689 -
0.9839 5626 - 0.7200
0.9846 5630 2.2356 -
0.9864 5640 2.2517 -
0.9881 5650 2.2436 -
0.9899 5660 2.2229 -
0.9916 5670 2.2617 -
0.9934 5680 2.2359 -
0.9941 5684 - 0.7201
0.9951 5690 2.2444 -
0.9969 5700 2.2505 -
0.9986 5710 2.2713 -

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
3
Safetensors
Model size
82.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-mlbs-mar03

Finetuned
(27)
this model