Phando's picture
Update README.md
2d86537
---
datasets:
- Phando/uspto-50k
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- chemistry
license: mit
---
This [ChemBERTa-v2](https://huggingface.co/seyonec/ChemBERTa_zinc250k_v2_40k) checkpoint was fine-tuned on the [USPTO-50k](https://huggingface.co/datasets/Phando/uspto-50k) dataset for sequence classification.
Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").
- Train/Test split: 0.99/0.01
- Evaluation results:
- Accuracy: 87.11%
- Loss: 0.4272
- Fine-tuning hyperparameters:
- seed = 233
- batch-size = 128
- num_epochs = 5 (but early stopped at epoch 4)
- learning_rate = 5e-4
- warmup_steps = 64
- weight_decay = 0.01
- lr_scheduler_type = "cosine"