File size: 817 Bytes
7208295
 
 
 
 
 
 
 
2d86537
7208295
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9119cbb
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
datasets:
- Phando/uspto-50k
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- chemistry
license: mit
---

This [ChemBERTa-v2](https://huggingface.co/seyonec/ChemBERTa_zinc250k_v2_40k) checkpoint was fine-tuned on the [USPTO-50k](https://huggingface.co/datasets/Phando/uspto-50k) dataset for sequence classification.

Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").

- Train/Test split: 0.99/0.01

- Evaluation results:
  - Accuracy: 87.11%
  - Loss: 0.4272

- Fine-tuning hyperparameters:
  - seed = 233
  - batch-size = 128
  - num_epochs = 5 (but early stopped at epoch 4)
  - learning_rate = 5e-4
  - warmup_steps = 64
  - weight_decay = 0.01
  - lr_scheduler_type = "cosine"