The peptide prediction ecosystem is exploding π€―π§¬
peptides, machine learning, deep learning, bioinformatics, AMP, reproducibility, prediction tools
π Project highlights
- 𧬠Reviews 140 peptide prediction tools
- π€ Covers AMP, anticancer, antiviral & more
- π Tracks rise of deep learning in peptide ML
- β οΈ Reveals major reproducibility crisis
- π Provides curated peptide prediction resource database
π New paper out! This one tackles a very meta problem:
π there are now SO MANY peptide predictors that choosing one became a challenge itself π
π The dynamic landscape of peptide activity prediction
π Explore the peptide prediction landscape
- π Tool list: https://biogenies.info/peptide-prediction-list/
π one place to browse the chaotic peptide prediction universe π
π§ Audio summary
Antimicrobial predictors. Anticancer predictors. Antiviral predictors.
Deep learning everywhere. Broken web servers. Missing code π
π Hereβs a short audio overview π§ explaining what is happening in peptide ML:
π¬ What is this about?
Peptides can do a lot:
- π¦ antimicrobial activity
- π― anticancer effects
- π§ bloodβbrain barrier penetration
- π₯ anti-inflammatory activity
- 𧬠anti-amyloid interactions
Because of that:
π researchers keep building ML predictors for peptide activity.
And the field exploded π
π The scale of the problem
This review identified:
- 140 peptide prediction tools
- published between 2009β2022
Activities included:
- antimicrobial peptides (AMPs)
- anticancer peptides
- antiviral peptides
- antifungal peptides
- cell-penetrating peptides
- bloodβbrain barrier peptides
β¦and many more π
β οΈ The core problem
Too many tools.
Too many datasets.
Too many inconsistent definitions.
Example:
π βAMPβ sometimes means:
- antibacterial only β
- all antimicrobial peptides β
depending on the paper
And this creates:
- benchmark bias
- incompatible predictors
- reproducibility nightmares
π§ What they analyzed
π Trends in peptide ML
The study explored:
- predictive activities
- ML architectures
- citations
- reproducibility
- web server availability
π€ Deep learning explosion
Before 2018:
π basically no deep models.
By 2021:
π almost HALF of predictors used deep learning

𧬠Most common activities
Top prediction targets were:
- anticancer peptides
- antimicrobial peptides
- antiviral peptides

π AMP and anticancer prediction dominate the field.
βοΈ Most common ML models
The kings of peptide ML:
- π² Random Forests
- π Support Vector Machines
- π€ Deep Learning architectures
Interestingly:
π deep learning is popularβ¦
BUT not always clearly better.
π Key insights
β οΈ Reproducibility crisis π¨
This was probably the biggest finding.
Among 111 analyzed tools:
- only 38 met minimum reproducibility requirements
- only 9 achieved βgold standardβ reproducibility

Meaning:
β missing code
β missing datasets
β broken workflows
π and yesβ¦
many web servers were dead π
π Web servers matter more than reproducibility
Surprisingly:
π tools with active web servers got more citations
even if they were less reproducible
This creates a weird incentive:
π flashy website > robust science
π€ Deep learning hype is real
Deep models became more cited over time.
BUT:
the review highlights that for AMP prediction:
π DL often does not outperform shallow models
π Why this matters
π§ For peptide therapeutics
Peptide ML is now central for:
- antimicrobial discovery
- anticancer peptides
- neurodegeneration research
β οΈ For bioinformatics
The field needs:
- better standards
- fair benchmarking
- reproducibility
- maintenance of tools
π For users
This review acts as:
π a navigation map for peptide prediction tools.
π BioGenies perspective
This paper is basically:
π βsomeone had to say itβ π
The peptide ML ecosystem is:
- powerful
- exciting
- rapidly growing
BUT ALSO:
- fragmented
- biased
- difficult to reproduce
And honestly:
π this paper connects almost ALL our work together:
- AmpGram
- CancerGram
- AMP benchmarking
- peptide therapeutics
- reproducibility standards