bert-base-cased-ChemTok-ZN15-55KTyrosinase-V1
This model is a fine-tuned version of bert-base-cased on the cafierom/ZN55K_Tyrosinase dataset of drug or drug-like molecules.
It achieves the following results on the evaluation set:
- Loss: 0.3050
Model description
This domain adaptation of bert-base-cased has been trained on ~56.5K molecular SMILES strings, with added tokens:
new_tokens = ["[C@H]","[C@@H]","(F)","(Cl)","c1","c2","(O)","N#C","(=O)",
"([N+]([O-])=O)","[O-]","(OC)","(C)","[NH3+]","(I)","[Na+]","C#N"]
Intended uses & limitations
It is meant to be used for finetuning classification models for drug-related tasks, and for generative unmasking.
Training and evaluation data
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 30
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.8365 | 1.0 | 133 | 1.8916 |
1.2975 | 2.0 | 266 | 0.9819 |
0.955 | 3.0 | 399 | 0.8473 |
0.7853 | 4.0 | 532 | 0.6689 |
0.6547 | 5.0 | 665 | 0.5779 |
0.5822 | 6.0 | 798 | 0.5189 |
0.5312 | 7.0 | 931 | 0.4685 |
0.4937 | 8.0 | 1064 | 0.4340 |
0.467 | 9.0 | 1197 | 0.4364 |
0.4442 | 10.0 | 1330 | 0.3819 |
0.4309 | 11.0 | 1463 | 0.4075 |
0.4159 | 12.0 | 1596 | 0.3835 |
0.4045 | 13.0 | 1729 | 0.3692 |
0.3874 | 14.0 | 1862 | 0.3672 |
0.3734 | 15.0 | 1995 | 0.3617 |
0.3742 | 16.0 | 2128 | 0.3492 |
0.364 | 17.0 | 2261 | 0.3357 |
0.3547 | 18.0 | 2394 | 0.3323 |
0.3435 | 19.0 | 2527 | 0.3222 |
0.3424 | 20.0 | 2660 | 0.3140 |
0.3417 | 21.0 | 2793 | 0.3117 |
0.3339 | 22.0 | 2926 | 0.3153 |
0.326 | 23.0 | 3059 | 0.3233 |
0.3292 | 24.0 | 3192 | 0.3019 |
0.324 | 25.0 | 3325 | 0.2960 |
0.3228 | 26.0 | 3458 | 0.3058 |
0.3176 | 27.0 | 3591 | 0.2984 |
0.3112 | 28.0 | 3724 | 0.2889 |
0.3172 | 29.0 | 3857 | 0.2973 |
0.3103 | 30.0 | 3990 | 0.3050 |
Framework versions
- Transformers 4.50.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support