Phando's picture
Update README.md
2d86537
|
raw
history blame contribute delete
No virus
817 Bytes
metadata
datasets:
  - Phando/uspto-50k
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - chemistry
license: mit

This ChemBERTa-v2 checkpoint was fine-tuned on the USPTO-50k dataset for sequence classification.

Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").

  • Train/Test split: 0.99/0.01

  • Evaluation results:

    • Accuracy: 87.11%
    • Loss: 0.4272
  • Fine-tuning hyperparameters:

    • seed = 233
    • batch-size = 128
    • num_epochs = 5 (but early stopped at epoch 4)
    • learning_rate = 5e-4
    • warmup_steps = 64
    • weight_decay = 0.01
    • lr_scheduler_type = "cosine"