FlowFinal: AMP Flow Matching Model
FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space.
Model Description
- Model Type: Flow Matching for Protein Generation
- Domain: Antimicrobial Peptide (AMP) Generation
- Base Model: ESM-2 (650M parameters)
- Architecture: Transformer-based flow matching with classifier-free guidance (CFG)
- Training Data: Curated AMP dataset with ~7K sequences
Key Features
- Classifier-Free Guidance (CFG): Enables controlled generation with different conditioning strengths
- ESM-2 Integration: Leverages pre-trained protein language model embeddings
- Compression Architecture: Efficient 16x compression of ESM-2 embeddings (1280 β 80 dimensions)
- Multiple CFG Scales: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance
Model Components
Core Architecture
final_flow_model.py
: Main flow matching model implementationcompressor_with_embeddings.py
: Embedding compression/decompression modulesfinal_sequence_decoder.py
: ESM-2 embedding to sequence decoder
Trained Weights
final_compressor_model.pth
: Trained compressor (315MB)final_decompressor_model.pth
: Trained decompressor (158MB)amp_flow_model_final_optimized.pth
: Main flow model checkpoint
Generated Samples (Today's Results)
- Generated AMP sequences with different CFG scales
- HMD-AMP validation results showing 8.8% AMP prediction rate
Performance Results
HMD-AMP Validation (80 sequences tested)
- Total AMPs Predicted: 7/80 (8.8%)
- By CFG Configuration:
- No CFG: 1/20 (5.0%)
- Weak CFG: 2/20 (10.0%)
- Strong CFG: 4/20 (20.0%) β Best performance
- Very Strong CFG: 0/20 (0.0%)
Best Performing Sequences
ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA
(No CFG)EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL
(Weak CFG)IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI
(Strong CFG)
Usage
from generate_amps import AMPGenerator
# Initialize generator
generator = AMPGenerator(
model_path="amp_flow_model_final_optimized.pth",
device='cuda'
)
# Generate AMP samples
samples = generator.generate_amps(
num_samples=20,
num_steps=25,
cfg_scale=7.5 # Strong CFG recommended
)
Training Details
- Optimizer: AdamW with cosine annealing
- Learning Rate: 4e-4 (final)
- Epochs: 2000
- Final Loss: 1.318
- Training Time: 2.3 hours on H100
- Dataset Size: 6,983 samples
Files Structure
FlowFinal/
βββ models/
β βββ final_compressor_model.pth
β βββ final_decompressor_model.pth
β βββ amp_flow_model_final_optimized.pth
βββ generated_samples/
β βββ generated_sequences_20250829.fasta
β βββ hmd_amp_detailed_results.csv
βββ src/
β βββ final_flow_model.py
β βββ compressor_with_embeddings.py
β βββ final_sequence_decoder.py
β βββ generate_amps.py
βββ README.md
Citation
If you use FlowFinal in your research, please cite:
@misc{flowfinal2025,
title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation},
author={Edward Sun},
year={2025},
url={https://huggingface.co/esunAI/FlowFinal}
}
License
This model is released under the MIT License.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support