File size: 817 Bytes
7208295 2d86537 7208295 9119cbb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
datasets:
- Phando/uspto-50k
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- chemistry
license: mit
---
This [ChemBERTa-v2](https://huggingface.co/seyonec/ChemBERTa_zinc250k_v2_40k) checkpoint was fine-tuned on the [USPTO-50k](https://huggingface.co/datasets/Phando/uspto-50k) dataset for sequence classification.
Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").
- Train/Test split: 0.99/0.01
- Evaluation results:
- Accuracy: 87.11%
- Loss: 0.4272
- Fine-tuning hyperparameters:
- seed = 233
- batch-size = 128
- num_epochs = 5 (but early stopped at epoch 4)
- learning_rate = 5e-4
- warmup_steps = 64
- weight_decay = 0.01
- lr_scheduler_type = "cosine" |