FlowFinal: AMP Flow Matching Model

FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space.

Model Description

  • Model Type: Flow Matching for Protein Generation
  • Domain: Antimicrobial Peptide (AMP) Generation
  • Base Model: ESM-2 (650M parameters)
  • Architecture: Transformer-based flow matching with classifier-free guidance (CFG)
  • Training Data: Curated AMP dataset with ~7K sequences

Key Features

  • Classifier-Free Guidance (CFG): Enables controlled generation with different conditioning strengths
  • ESM-2 Integration: Leverages pre-trained protein language model embeddings
  • Compression Architecture: Efficient 16x compression of ESM-2 embeddings (1280 β†’ 80 dimensions)
  • Multiple CFG Scales: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance

Model Components

Core Architecture

  • final_flow_model.py: Main flow matching model implementation
  • compressor_with_embeddings.py: Embedding compression/decompression modules
  • final_sequence_decoder.py: ESM-2 embedding to sequence decoder

Trained Weights

  • final_compressor_model.pth: Trained compressor (315MB)
  • final_decompressor_model.pth: Trained decompressor (158MB)
  • amp_flow_model_final_optimized.pth: Main flow model checkpoint

Generated Samples (Today's Results)

  • Generated AMP sequences with different CFG scales
  • HMD-AMP validation results showing 8.8% AMP prediction rate

Performance Results

HMD-AMP Validation (80 sequences tested)

  • Total AMPs Predicted: 7/80 (8.8%)
  • By CFG Configuration:
    • No CFG: 1/20 (5.0%)
    • Weak CFG: 2/20 (10.0%)
    • Strong CFG: 4/20 (20.0%) ← Best performance
    • Very Strong CFG: 0/20 (0.0%)

Best Performing Sequences

  1. ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA (No CFG)
  2. EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL (Weak CFG)
  3. IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI (Strong CFG)

Usage

from generate_amps import AMPGenerator

# Initialize generator
generator = AMPGenerator(
    model_path="amp_flow_model_final_optimized.pth",
    device='cuda'
)

# Generate AMP samples
samples = generator.generate_amps(
    num_samples=20,
    num_steps=25,
    cfg_scale=7.5  # Strong CFG recommended
)

Training Details

  • Optimizer: AdamW with cosine annealing
  • Learning Rate: 4e-4 (final)
  • Epochs: 2000
  • Final Loss: 1.318
  • Training Time: 2.3 hours on H100
  • Dataset Size: 6,983 samples

Files Structure

FlowFinal/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ final_compressor_model.pth
β”‚   β”œβ”€β”€ final_decompressor_model.pth
β”‚   └── amp_flow_model_final_optimized.pth
β”œβ”€β”€ generated_samples/
β”‚   β”œβ”€β”€ generated_sequences_20250829.fasta
β”‚   └── hmd_amp_detailed_results.csv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ final_flow_model.py
β”‚   β”œβ”€β”€ compressor_with_embeddings.py
β”‚   β”œβ”€β”€ final_sequence_decoder.py
β”‚   └── generate_amps.py
└── README.md

Citation

If you use FlowFinal in your research, please cite:

@misc{flowfinal2025,
  title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation},
  author={Edward Sun},
  year={2025},
  url={https://huggingface.co/esunAI/FlowFinal}
}

License

This model is released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support