SentenceTransformer based on Shuu12121/CodeModernBERT-Owl-2.2-Pre

This is a sentence-transformers model finetuned from Shuu12121/CodeModernBERT-Owl-2.2-Pre. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Shuu12121/CodeModernBERT-Owl-2.2-Pre
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '<pre>\nField extraction metadata on the property.\n</pre>\n\n<code>.google.cloud.documentai.v1beta3.FieldExtractionMetadata field_extraction_metadata = 9;\n</code>\n\n@return Whether the fieldExtractionMetadata field is set.',
    '@java.lang.Override\n  public boolean hasFieldExtractionMetadata() {\n    return ((bitField0_ & 0x00000001) != 0);\n  }',
    'pub fn poller(self) -> impl lro::Poller<(), crate::model::DeleteSitemapMetadata> {\n            type Operation =\n                lro::internal::Operation<wkt::Empty, crate::model::DeleteSitemapMetadata>;\n            let polling_error_policy = self.0.stub.get_polling_error_policy(&self.0.options);\n            let polling_backoff_policy = self.0.stub.get_polling_backoff_policy(&self.0.options);\n\n            let stub = self.0.stub.clone();\n            let mut options = self.0.options.clone();\n            options.set_retry_policy(gax::retry_policy::NeverRetry);\n            let query = move |name| {\n                let stub = stub.clone();\n                let options = options.clone();\n                async {\n                    let op = GetOperation::new(stub)\n                        .set_name(name)\n                        .with_options(options)\n                        .send()\n                        .await?;\n                    Ok(Operation::new(op))\n                }\n            };\n\n            let start = move || async {\n                let op = self.send().await?;\n                Ok(Operation::new(op))\n            };\n\n            lro::internal::new_unit_response_poller(\n                polling_error_policy,\n                polling_backoff_policy,\n                start,\n                query,\n            )\n        }',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,732,400 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 8 tokens
    • mean: 70.49 tokens
    • max: 1024 tokens
    • min: 5 tokens
    • mean: 138.25 tokens
    • max: 1024 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Prints the specified pkg.

    If is_main is not set, nested package notation is used.
    pub fn print_package(
    &mut self,
    resolve: &Resolve,
    pkg: PackageId,
    is_main: bool,
    ) -> Result<()> {
    let pkg = &resolve.packages[pkg];
    self.print_package_outer(pkg)?;

    if is_main {
    self.output.semicolon();
    self.output.newline();
    } else {
    self.output.indent_start();
    }

    for (name, id) in pkg.interfaces.iter() {
    self.print_interface_outer(resolve, *id, name)?;
    self.output.indent_start();
    self.print_interface(resolve, *id)?;
    self.output.indent_end();
    if is_main {
    self.output.newline();
    }
    }

    for (name, id) in pkg.worlds.iter() {
    self.print_docs(&resolve.worlds[*id].docs);
    self.print_stability(&resolve.worlds[*id].stability);
    self.output.keyword("world");
    self.output.str(" ");
    self.print_name_type(name, TypeKind:...
    1.0

    An alternative descriptive name for the user.

    pub fn nick_name(mut self, input: impl ::std::convert::Into<::std::string::String>) -> Self {
    self.nick_name = ::std::option::Option::Some(input.into());
    self
    }
    1.0

    Indicates whether the match is case sensitive.

    pub fn case_sensitive(mut self, input: bool) -> Self {
    self.case_sensitive = ::std::option::Option::Some(input);
    self
    }
    1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 150
  • per_device_eval_batch_size: 150
  • num_train_epochs: 5
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 150
  • per_device_eval_batch_size: 150
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0274 500 0.7202
0.0549 1000 0.1625
0.0823 1500 0.149
0.1098 2000 0.1388
0.1372 2500 0.1292
0.1647 3000 0.126
0.1921 3500 0.1204
0.2196 4000 0.1161
0.2470 4500 0.1074
0.2745 5000 0.1063
0.3019 5500 0.1004
0.3294 6000 0.0972
0.3568 6500 0.0941
0.3843 7000 0.0944
0.4117 7500 0.0884
0.4392 8000 0.0895
0.4666 8500 0.0867
0.4941 9000 0.0847
0.5215 9500 0.0805
0.5490 10000 0.0822
0.5764 10500 0.0784
0.6039 11000 0.0741
0.6313 11500 0.0734
0.6588 12000 0.0719
0.6862 12500 0.0687
0.7137 13000 0.0656
0.7411 13500 0.0681
0.7686 14000 0.0655
0.7960 14500 0.067
0.8235 15000 0.0628
0.8509 15500 0.0619
0.8783 16000 0.0592
0.9058 16500 0.0604
0.9332 17000 0.0585
0.9607 17500 0.0545
0.9881 18000 0.0543
1.0156 18500 0.0381
1.0430 19000 0.0263
1.0705 19500 0.0247
1.0979 20000 0.026
1.1254 20500 0.0263
1.1528 21000 0.0267
1.1803 21500 0.0277
1.2077 22000 0.0269
1.2352 22500 0.027
1.2626 23000 0.0274
1.2901 23500 0.0275
1.3175 24000 0.0283
1.3450 24500 0.0271
1.3724 25000 0.0269
1.3999 25500 0.0272
1.4273 26000 0.0263
1.4548 26500 0.0266
1.4822 27000 0.0259
1.5097 27500 0.0272
1.5371 28000 0.0277
1.5646 28500 0.0273
1.5920 29000 0.0251
1.6195 29500 0.0256
1.6469 30000 0.0256
1.6744 30500 0.0248
1.7018 31000 0.0253
1.7292 31500 0.0244
1.7567 32000 0.0242
1.7841 32500 0.0219
1.8116 33000 0.0246
1.8390 33500 0.023
1.8665 34000 0.0239
1.8939 34500 0.0217
1.9214 35000 0.0217
1.9488 35500 0.0224
1.9763 36000 0.0223
2.0037 36500 0.0201
2.0312 37000 0.0102
2.0586 37500 0.0097
2.0861 38000 0.009
2.1135 38500 0.0092
2.1410 39000 0.0094
2.1684 39500 0.0096
2.1959 40000 0.0101
2.2233 40500 0.0101
2.2508 41000 0.0099
2.2782 41500 0.01
2.3057 42000 0.01
2.3331 42500 0.01
2.3606 43000 0.0099
2.3880 43500 0.0098
2.4155 44000 0.0099
2.4429 44500 0.0101
2.4704 45000 0.0098
2.4978 45500 0.01
2.5253 46000 0.0099
2.5527 46500 0.0096
2.5801 47000 0.0092
2.6076 47500 0.0091
2.6350 48000 0.009
2.6625 48500 0.0091
2.6899 49000 0.0092
2.7174 49500 0.0095
2.7448 50000 0.0089
2.7723 50500 0.0093
2.7997 51000 0.0097
2.8272 51500 0.0092
2.8546 52000 0.0093
2.8821 52500 0.0091
2.9095 53000 0.0091
2.9370 53500 0.0089
2.9644 54000 0.0084
2.9919 54500 0.0078
3.0193 55000 0.0063
3.0468 55500 0.0046
3.0742 56000 0.0047
3.1017 56500 0.0051
3.1291 57000 0.0049
3.1566 57500 0.0049
3.1840 58000 0.0051
3.2115 58500 0.0048
3.2389 59000 0.0053
3.2664 59500 0.0049
3.2938 60000 0.0049
3.3213 60500 0.005
3.3487 61000 0.0055
3.3762 61500 0.0052
3.4036 62000 0.005
3.4310 62500 0.0049
3.4585 63000 0.0051
3.4859 63500 0.005
3.5134 64000 0.005
3.5408 64500 0.005
3.5683 65000 0.0046
3.5957 65500 0.0049
3.6232 66000 0.0045
3.6506 66500 0.0044
3.6781 67000 0.0046
3.7055 67500 0.0049
3.7330 68000 0.0049
3.7604 68500 0.0042
3.7879 69000 0.0042
3.8153 69500 0.0046
3.8428 70000 0.0049
3.8702 70500 0.0042
3.8977 71000 0.0041
3.9251 71500 0.0043
3.9526 72000 0.0042
3.9800 72500 0.0041
4.0075 73000 0.004
4.0349 73500 0.0031
4.0624 74000 0.0031
4.0898 74500 0.003
4.1173 75000 0.003
4.1447 75500 0.0029
4.1722 76000 0.0031
4.1996 76500 0.0029
4.2271 77000 0.003
4.2545 77500 0.0029
4.2819 78000 0.0029
4.3094 78500 0.0027
4.3368 79000 0.0028
4.3643 79500 0.0028
4.3917 80000 0.003
4.4192 80500 0.0027
4.4466 81000 0.0027
4.4741 81500 0.003
4.5015 82000 0.0028
4.5290 82500 0.0029
4.5564 83000 0.0027
4.5839 83500 0.0027
4.6113 84000 0.0029
4.6388 84500 0.0026
4.6662 85000 0.0027
4.6937 85500 0.0027
4.7211 86000 0.0025
4.7486 86500 0.0029
4.7760 87000 0.0027
4.8035 87500 0.0026
4.8309 88000 0.0028
4.8584 88500 0.0025
4.8858 89000 0.0024
4.9133 89500 0.0027
4.9407 90000 0.0026
4.9682 90500 0.0026
4.9956 91000 0.0028

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
12
Safetensors
Model size
152M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shuu12121/CodeSearch-ModernBERT-Owl-2.2

Finetuned
(1)
this model