Amyloid databases: mapping the aggregation universe π§¬π
amyloids, protein aggregation, databases, bioinformatics, amyloid prediction, protein misfolding, aggregation
π Project highlights
- 𧬠Comprehensive overview of amyloid & aggregation databases
- π Covers sequence, structural and interaction resources
- π Highlights connections between databases and prediction tools
- β οΈ Identifies key limitations in current resources
- π Provides curated list of databases: link

π New review out! This one is less about a single tool and more about the entire ecosystem of amyloid data π
π Explore the resources
π This is basically a map of the amyloid bioinformatics landscape.
π§ Audio summary
Too many databases to remember? Same π
π Weβve added a short audio overview π§ so you donβt have to memorize all of them.
π¬ What is this about?
Amyloid aggregation is a complex, multi-factorial process involving:
- sequence features
- 3D structure
- environmental conditions
π and it underlies:
- neurodegenerative diseases
- biotechnological challenges
- functional biological processes
Because of this complexity:
π researchers have built many specialized databases to organize experimental knowledge
π§ What we reviewed
We systematically analyzed amyloid-related databases, grouping them into:
𧬠Sequence-based databases
- focus on aggregation-prone regions (APRs)
- example: AmyLoad, AmyPro
π§ Structure-based databases
- store 3D fibril structures
- example: Amyloid Atlas
π Interaction databases
- capture cross-interactions between amyloids
- example: AmyloGraph
π Each database captures different aspects of aggregation.
π Key insight: fragmentation problem
There is no single βperfectβ database.
Instead:
- each resource focuses on a specific niche
- data formats and annotations differ
- integration is difficult
π Result:
β no unified benchmark dataset
β hard to compare prediction tools
β fragmented knowledge
βοΈ Databases βοΈ prediction tools (the feedback loop)
One of the most important conclusions:
π databases and prediction tools co-evolve
- experimental datasets β enable model development
- prediction tools β generate new hypotheses
- new experiments β expand databases
π A continuous feedback loop driving the field forward.
𧬠Examples of this interplay
- AmyloGraph β enabled PACT / AmyloComp (cross-interactions)
- AmyloBase β contributed to AGGRESCAN
- Waltz datasets β led to WALTZ algorithm
π Data β model β better data β better model
β οΈ Key limitations (important!)
Across databases:
- π limited search & filtering
- π€ poor export options
- π§Ύ incomplete metadata
- π€ reliance on predictions (with biases)
π And most importantly: aggregation is not only sequence-dependent
Environmental factors matter:
- pH
- temperature
- concentration
- cofactors
π Why this matters
This review shows:
π we have a lot of data
π but not yet fully integrated knowledge
Future directions:
- better standardization (e.g. MIRRAGGE)
- integration of datasets
- ML models using multi-dimensional data