AbNovoBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Monoclonal Antibody De Novo Sequencing Analysis

This repository contains a curated collection of state-of-the-art de novo peptide sequencing models specifically benchmarked for monoclonal antibody (mAb) sequencing from mass spectrometry data. AbNovoBench provides the largest high-quality dataset to date, comprising 1,638,248 peptide-spectrum matches derived from 131 mAbs across six species and 11 proteases, supplemented by eight mAbs with known sequence information for assessing full-length reconstruction.

📋 Models

This repository includes the following models that have been comprehensively evaluated in our benchmark:

AdaNovo

Model: AdaNovo/epoch=2-step=170451.ckpt
Description: Adaptive de novo peptide sequencing model with enhanced accuracy for complex spectra
Repository: https://github.com/Westlake-OmicsAI/adanovo_v1

CasaNovo

Models:
- CasaNovoV1/epoch=10-step=600000.ckpt (V1)
- CasaNovoV2/epoch=7-step=400000.ckpt (V2)
Description: High-throughput de novo peptide sequencing models with improved performance
Repository: https://github.com/Noble-Lab/casanovo

ContraNovo

Model: ContraNovo/ControNovo.ckpt
Description: Contrastive learning-based de novo peptide sequencing model
Repository: https://github.com/BEAM-Labs/ContraNovo

DeepNovo

Model: DeepNovo/translate.ckpt-283400.*
Description: Deep learning-based de novo peptide sequencing with attention mechanisms
Repository: https://github.com/nh2tran/DeepNovo

InstaNovo

Model: InstaNovo/epoch=59-step=1700000.ckpt
Description: Real-time de novo peptide sequencing model optimized for speed and accuracy
Repository: https://github.com/instadeepai/InstaNovo

PepNet

Model: PepNet/model.h5
Description: Neural network-based peptide sequence prediction model
Repository: https://github.com/lkytal/pepnet

PGPointNovo

Models:
- PGPointNovo/backward_deepnovo.pth
- PGPointNovo/forward_deepnovo.pth
Description: Point-based graph neural network for de novo peptide sequencing
Repository: https://github.com/shallFun4Learning/PGPointNovo

pi-HelixNovo

Model: pi-HelixNovo/epoch=14-step=800000.ckpt
Description: Helix-inspired architecture for peptide sequence prediction
Repository: https://github.com/PHOENIXcenter/pi-HelixNovo

pi-PrimeNovo

Model: pi-PrimeNovo/model_massive.ckpt
Description: Prime-based de novo peptide sequencing model with massive training
Repository: https://github.com/PHOENIXcenter/pi-HelixNovo

PointNovo

Models:
- PointNovo/backward_deepnovo.pth
- PointNovo/forward_deepnovo.pth
Description: Point cloud-based approach for de novo peptide sequencing
Repository: https://github.com/irleader/PointNovo

SMSNet

Model: SMSNet/translate.ckpt-680000.*
Description: Sequence-to-sequence model for mass spectrometry-based peptide sequencing
Repository: https://github.com/cmb-chula/SMSNet

🚀 Usage

For detailed usage instructions, implementation examples, and model-specific documentation, please refer to the original repositories listed above for each model. Each repository contains:

Installation instructions
Model loading examples
Training procedures
Inference code
Performance benchmarks
Dataset information

This collection serves as a centralized repository of pre-trained models for easy access and comparison.

📊 Benchmark Results

Our comprehensive evaluation of 13 deep learning-based de novo peptide sequencing algorithms across six metric categories revealed:

Peptide Sequencing Performance

Transformer-based models (ContraNovo, Casanovo V1, and InstaNovo) showed superior performance
Precision and recall: 0.73–0.79 for amino acids and 0.60–0.67 for peptides
High efficacy in detecting post-translational modifications
Excellent generalization across diverse enzymes and species

Assembly Performance

Template-guided Fusion assembler achieved error-free reconstruction of all chains and complementarity-determining regions (CDRs)
Superior coverage, accuracy, and gap minimization when using high-quality peptide reads from six algorithms
Comprehensive evaluation across coverage depth and assembly score metrics

🔬 Research Applications

AbNovoBench is specifically designed for monoclonal antibody research and applications:

Antibody Discovery: De novo sequencing of monoclonal antibodies from mass spectrometry data
Therapeutic Development: Characterization of antibody sequences for drug development
Clinical Diagnostics: Antibody sequencing for diagnostic applications
Proteomics Research: Standardized benchmarking for antibody-specific algorithm development

📚 Citation

If you use AbNovoBench in your research, please cite our paper:

@misc{jiang2025abnovobench,
  title        = {AbNovoBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Monoclonal Antibody De Novo Sequencing Analysis},
  author       = {Wenbin Jiang and Ling Luo and Lihong Huang and Jin Xiao and Zihan Lin and Yijie Qiu and Jiying Wang and Ouyang Hu and Sainan Zhang and Mengsha Tong and Ningshao Xia and Yueting Xiong and Quan Yuan and Rongshan Yu},
  year         = {2025},
  howpublished = {https://github.com/dumbgoos/AbNovoBench}
}

🤝 Contributing

We welcome contributions to improve the models or add new ones. Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

🙏 Acknowledgments

We thank the original authors of each model for their contributions to the field of de novo peptide sequencing. This collection represents the collaborative effort of the proteomics community. AbNovoBench is available at https://abnovobench.com and provides a scalable, community-driven platform enriched with an extensive antibody MS data resource to accelerate antibody-specific algorithm development and enhance proteomic reproducibility.

📞 Contact

For questions or support, please open an issue on this repository or contact the maintainers.

Note: These models are provided for research purposes. Please ensure you have the appropriate licenses and permissions for your specific use case.

LLMasterLL
/

AbNovobench

You need to agree to share your contact information to access this model