|
--- |
|
datasets: |
|
- Phando/uspto-50k |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
tags: |
|
- chemistry |
|
license: mit |
|
--- |
|
|
|
This [ChemBERTa-v2](https://huggingface.co/seyonec/ChemBERTa_zinc250k_v2_40k) checkpoint was fine-tuned on the [USPTO-50k](https://huggingface.co/datasets/Phando/uspto-50k) dataset for sequence classification. |
|
|
|
Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by "."). |
|
|
|
- Train/Test split: 0.99/0.01 |
|
|
|
- Evaluation results: |
|
- Accuracy: 87.11% |
|
- Loss: 0.4272 |
|
|
|
- Fine-tuning hyperparameters: |
|
- seed = 233 |
|
- batch-size = 128 |
|
- num_epochs = 5 (but early stopped at epoch 4) |
|
- learning_rate = 5e-4 |
|
- warmup_steps = 64 |
|
- weight_decay = 0.01 |
|
- lr_scheduler_type = "cosine" |