SentenceTransformer based on DeepChem/ChemBERTa-77M-MLM

This is a sentence-transformers model finetuned from DeepChem/ChemBERTa-77M-MLM. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: DeepChem/ChemBERTa-77M-MLM
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("HassanCS/chemBERTa-tuned-on-ClinTox-using-MultipleNegativesRankingLoss")
# Run inference
sentences = [
    'CON=C(C(=O)NC1C(=O)N2C(C(=O)[O-])=C(C[N+]3(C)CCCC3)CSC12)c1csc(N)n1',
    'CC1CNc2c(cccc2S(=O)(=O)NC(CCC[NH+]=C(N)N)C(=O)N2CCC(C)CC2C(=O)[O-])C1',
    'CC(C)C1(C(=O)NC2CC(=O)OC2(O)CF)CC(c2nccc3ccccc23)=NO1',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.7135

Training Details

Training Dataset

Unnamed Dataset

  • Size: 118,400 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 20 tokens
    • mean: 33.0 tokens
    • max: 60 tokens
    • min: 3 tokens
    • mean: 47.34 tokens
    • max: 212 tokens
    • min: 3 tokens
    • mean: 53.88 tokens
    • max: 212 tokens
  • Samples:
    anchor positive negative
    CC(C)CC(NC(=O)CNC(=O)c1cc(Cl)ccc1Cl)B(O)O CC(=O)OC1CCC2(C)C(=CCC3C2CCC2(C)C(c4cccnc4)=CCC32)C1 CCOC(=O)c1ncn2c1CN(C)C(=O)c1cc(F)ccc1-2
    CC(C)CC(NC(=O)CNC(=O)c1cc(Cl)ccc1Cl)B(O)O COc1ccc(C(CN(C)C)C2(O)CCCCC2)cc1 C[NH2+]C1(C)C2CCC(C2)C1(C)C
    CC(C)CC(NC(=O)CNC(=O)c1cc(Cl)ccc1Cl)B(O)O CNC(=O)c1cc(Oc2ccc(NC(=O)Nc3ccc(Cl)c(C(F)(F)F)c3)cc2)ccn1.Cc1ccc(S(=O)(=O)O)cc1 Nc1ncnc2c1ncn2C1OC(CO)C(O)C1O
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,480 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 18 tokens
    • mean: 54.07 tokens
    • max: 169 tokens
    • min: 18 tokens
    • mean: 60.4 tokens
    • max: 244 tokens
    • min: 30 tokens
    • mean: 71.25 tokens
    • max: 141 tokens
  • Samples:
    anchor positive negative
    CC(C)OC(=O)CCCC=CCC1C(O)CC(O)C1C=CC(O)COc1cccc(C(F)(F)F)c1 CC12CCCCCC(Cc3ccc(O)cc31)C2[NH3+] CC(C)C(CN1CCC(C)(c2cccc(O)c2)C(C)C1)NC(=O)C1Cc2ccc(O)cc2CN1
    CC(C)OC(=O)CCCC=CCC1C(O)CC(O)C1C=CC(O)COc1cccc(C(F)(F)F)c1 COc1cc2c(cc1OC)C1CC(=O)C(CC(C)C)C[NH+]1CC2 CC(C)C(CN1CCC(C)(c2cccc(O)c2)C(C)C1)NC(=O)C1Cc2ccc(O)cc2CN1
    CC(C)OC(=O)CCCC=CCC1C(O)CC(O)C1C=CC(O)COc1cccc(C(F)(F)F)c1 CNH+CCC=C1c2ccccc2COc2ccc(CC(=O)[O-])cc21 CC(C)C1(C(=O)NC2CC(=O)OC2(O)CF)CC(c2nccc3ccccc23)=NO1
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss all-dev_cosine_accuracy
0.0676 500 5.0821 5.1737 0.4047
0.1351 1000 4.9869 5.1766 0.4230
0.2027 1500 4.5562 4.9102 0.5345
0.2703 2000 3.2364 4.3712 0.6534
0.3378 2500 2.0738 4.0704 0.6736
0.4054 3000 1.4239 4.0200 0.6635
0.4730 3500 1.1578 3.7202 0.6791
0.5405 4000 0.9669 3.7197 0.6831
0.6081 4500 0.714 3.8818 0.6547
0.6757 5000 0.5359 4.0987 0.6243
0.7432 5500 0.5663 3.8127 0.6500
0.8108 6000 0.4827 3.8346 0.6676
0.8784 6500 0.4758 3.8333 0.6507
0.9459 7000 0.4759 3.6872 0.6912
1.0135 7500 0.4651 3.7229 0.6831
1.0811 8000 0.4739 3.8041 0.6662
1.1486 8500 0.4458 3.8235 0.6703
1.2162 9000 0.4189 3.7957 0.6716
1.2838 9500 0.4504 3.7422 0.6784
1.3514 10000 0.413 3.7588 0.6770
1.4189 10500 0.3808 3.9750 0.6615
1.4865 11000 0.3853 3.7417 0.6953
1.5541 11500 0.379 3.7319 0.6993
1.6216 12000 0.429 3.5620 0.7209
1.6892 12500 0.3735 3.6900 0.7020
1.7568 13000 0.3908 3.8182 0.6932
1.8243 13500 0.3848 3.7228 0.7101
1.8919 14000 0.3777 3.6604 0.7149
1.9595 14500 0.3912 3.7849 0.6946
2.0269 15000 0.3282 3.8607 0.7014
2.0945 15500 0.3324 3.8573 0.6953
2.1620 16000 0.3852 3.9420 0.7000
2.2296 16500 0.3633 3.7928 0.7189
2.2972 17000 0.3493 3.8217 0.7216
2.3647 17500 0.3554 3.8546 0.6993
2.4323 18000 0.3363 3.7764 0.6993
2.4999 18500 0.377 3.8224 0.6959
2.5674 19000 0.3569 3.8376 0.7155
2.635 19500 0.3414 4.0017 0.7034
2.7026 20000 0.3567 3.7405 0.7135
2.7701 20500 0.3524 3.9446 0.7189
2.8377 21000 0.3347 3.8140 0.7169
2.9053 21500 0.3458 4.0700 0.7088
2.9728 22000 0.3632 3.7930 0.7081
3.0404 22500 0.3496 3.9884 0.7236
3.1080 23000 0.3426 3.7102 0.7155
3.1755 23500 0.3579 3.9201 0.7135
3.2431 24000 0.3553 4.2237 0.7270
3.3107 24500 0.345 3.8090 0.7189
3.3782 25000 0.3475 3.7802 0.7284
3.4458 25500 0.3326 3.7549 0.7250
3.5134 26000 0.3228 3.6717 0.7216
3.5809 26500 0.3311 3.8241 0.7155
3.6485 27000 0.3215 3.8151 0.7142
3.7161 27500 0.3534 3.8639 0.7149
3.7836 28000 0.3369 4.0947 0.7101
3.8512 28500 0.3229 4.0495 0.7101
3.9188 29000 0.3442 4.0408 0.7169
3.9864 29500 0.3059 3.9493 0.6959
4.0538 30000 0.3349 4.0431 0.7108
4.1214 30500 0.3266 4.0224 0.7189
4.1889 31000 0.3501 3.9502 0.7169
4.2565 31500 0.3676 3.8903 0.7196
4.3241 32000 0.3191 3.7994 0.7162
4.3916 32500 0.3317 3.7889 0.7182
4.4592 33000 0.3304 3.8661 0.7108
4.5268 33500 0.3332 3.8822 0.7115
4.5943 34000 0.3435 3.7945 0.7088
4.6619 34500 0.317 3.8721 0.7243
4.7295 35000 0.3038 3.8615 0.7209
4.7970 35500 0.3093 3.8360 0.7162
4.8646 36000 0.3309 3.8277 0.7155
4.9322 36500 0.3378 3.7988 0.7128
4.9997 37000 0.311 3.8015 0.7135

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
3
Safetensors
Model size
3.43M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HassanCS/chemBERTa-tuned-on-ClinTox-using-MultipleNegativesRankingLoss

Finetuned
(7)
this model

Evaluation results