Listen to this story
|
What is AlphaFold 3?
On 8th May, Google DeepMind and Isomorphic Words unveiled the third generation of its protein folding model, AlphaFold 3. The new AI model has achieved 50% accuracy in predicting the structure and interactions of all biological molecules, including proteins, DNA, RNA, and ligands, making it the first AI system to surpass physics-based tools for biomolecular structure prediction.
However, along with AlphaFold 3, which is now available for non-commercial research purposes, there are popular alternatives to AlphaFold. These models form the basis of the drug discovery process and have other important life science impacts. Let’s take a look at the top alternatives of AlphaFold 3 in 2024.
Top Alternatives of AlphaFold 3 in 2024
- RoseTTAFold by Baker lab
- OmegaFold by Helixon
- I-TASSER by Yang Zhang
- Phyre2 by Lawrence Kelley’s
- ESMFold by Meta AI
- SWISS-MODEL by Torsten Schwede
- Robetta by Baker Lab
- HHPred by Max Planck Institute for Developmental Biology
- AlphaFold-Multimmer by Google DeepMind
- CollabFold by Milot Mirdita, Sergey Ovchinnikov, and Martin Steinegger
1. RoseTTAFold
Creator: Baker lab at the University of Washington, led by Minkyung Baek, Ph.D.
Pros:
- Utilises a “three-track” neural network architecture that processes one-, two-, and three-dimensional data about proteins simultaneously.
- Designed for rapid protein structure prediction, capable of computing structures in minutes on standard computing equipment
- Focuses more on integrating various data types (sequence, interaction, and structure) within its neural network
- Has been successfully applied to predict numerous protein structures, including those not well-understood or directly linked to human health issues like cancer and inflammation.
- It provides tools for modeling complex biological assemblies and enhancing understanding of multifaceted biological systems.
- Quick prediction times make it accessible for widespread use in both academic and clinical settings.
- Open-source and availabile through GitHub.
Cons:
- While highly accurate, it may not reach AlphaFold’s precision level in all scenarios, particularly with extremely complex proteins.
Training Method:
- RoseTTAFold is trained using both protein sequence data and structural data, allowing it to predict protein structures and their interactions effectively.
- Uses a combination of deep learning techniques and traditional bioinformatics methods to enhance prediction accuracy.
2. OmegaFold
Creator: Helixon
Pros:
- Predicts protein structures from a single primary sequence.
- Uses a protein language model and a geometry-inspired transformer model for predictions.
- Suitable for orphan proteins and fast-evolving proteins
- Does not rely on multiple sequence alignments (MSAs), unlike other models.
- Less dependent on extensive evolutionary data.
- Broad applicability to various protein types.
- Shows comparable accuracy to AlphaFold and RoseTTAFold on benchmark datasets like CASP and CAMEO
Cons:
- May face challenges in achieving consistent accuracy across all types of proteins compared to models like AlphaFold.
Training Method:
- Trained on unaligned and unlabeled protein sequences.
- Uses deep transformer-based models to learn residue representations.
Learn more about: Alphafold vs OmegaFold
3. I-TASSER
Creator: Yang Zhang and his team at the University of Michigan.
Pros:
- Uses an iterative threading assembly refinement approach for protein structure prediction.
- Capable of function prediction through structure-based annotations.
- Provides up to five full-length atomic models, ranked by cluster density, with estimations of accuracy, including TM-scores and RMSD.
Training Method:
- Employs multiple threading approaches to identify structural templates from known protein data.
- Constructs full-length atomic models using iterative template-based fragment assembly simulations.
- Uses a meta-server threading approach, LOMETS, for template identification.
- Provides comprehensive outputs including predicted models, secondary structures, solvent accessibility, and functional annotations.
- Known for achieving high accuracy in structure prediction as demonstrated in various CASP competitions.
- Generates multiple models allowing selection based on confidence scores.
To learn more Visit I-TASSER
4. Phyre2
Creator: Lawrence Kelley’s team at the Structural Bioinformatics Group, Imperial College London.
Pros:
- Employs “one to one threading” which allows users to model a sequence against a specific template of their choice, enhancing the accuracy when additional biological information is available.
- Includes tools like “BackPhyre” for scanning existing structures against genomes, and “Phyrealarm” for ongoing matching against newly added structures in the database.
- Integrated with “3DLigandSite” for high-accuracy binding site prediction.
- Provides a user-friendly web interface that is accessible to researchers without deep computational expertise.
- Offers a range of predictive and analytical tools that go beyond simple structure prediction.
Cons:
- While Phyre2 is highly effective for many common protein modelling tasks, its reliance on existing templates may limit its effectiveness for highly novel or poorly characterised proteins compared to methods like AlphaFold, which can predict structures without clear homologous templates.
Training Method
- Phyre2 uses advanced homology detection methods to model protein structures based on their alignment with known structures. It leverages a combination of hidden Markov models and heuristics to enhance sequence coverage and model confidence.
Explore more about Phyre2
5. ESMFold
Creator: Meta
Pros:
- ESMFold is based on a 15 billion parameter Transformer model and does not rely on multiple sequence alignments (MSAs), differing from models like AlphaFold2 which require MSAs.
- It can make predictions directly from amino acid sequences, significantly speeding up the inference process.
- ESMFold achieves similar accuracy levels to state-of-the-art models but is significantly faster, predicting structures up to 60 times faster than AlphaFold2 for certain sequences.
- The model was also designed to handle large-scale structure predictions efficiently, capable of predicting structures for one million protein sequences in less than a day.
- Does not require external databases or MSAs, simplifying the protein folding prediction process.
Training Method
- ESMFold uses a Transformer-based language model, specifically the ESM-2 model, which learns interactions between pairs of amino acids in a protein sequence.
Cons
- Since Meta disbanded the team behind ESMFold, it might not have new features anytime soon.
Explore ESMFold in GitHub
6. SWISS-MODEL
Creator: Torsten Schwede and his team at the Biozentrum of the University of Basel and the Swiss Institute of Bioinformatics.
Pros:
- Provides a user-friendly, web-based platform for automated comparative protein structure modelling.
- Integrates tools for structure assessment and comparison, such as QMEAN for model quality estimation.
- Allows users to explore structural templates interactively and visualise them in 3D within the browser.
- Free for academic use and supports a wide range of functionalities beyond basic modelling.
- Integrated with major biological databases and bioinformatics tools, enhancing its utility in research.
- SWISS-MODEL is specifically designed for ease of use, allowing even non-experts to perform protein modelling.
- It supports a wide range of functionalities, including modelling homo-oligomeric assemblies and incorporating ligands into the models.
Cons:
- While highly effective for known protein families, its accuracy may decrease for proteins with less characterised or more distant homologues.
Training Method
- Utilises homology modelling, relying on evolutionary information to predict protein structures by identifying and using known protein structures as templates.
- Employs algorithms to find the best match between the target sequence and available templates, optimising the alignment to predict the structure.
Explore: More about SWISS-Model
7. Robetta
Creator: Baker Lab at the University of Washington.
Pros
- Offers both template-based and de novo protein structure prediction.
- Users can input custom sequence alignments, apply constraints, and utilise local fragments in their modelling tasks.
- Includes RoseTTAFold, enhancing its prediction accuracy and speed.
- Robetta allows for user-interaction in the modelling process, offering customisation that automated systems like AlphaFold do not typically allow.
- It integrates machine learning techniques with traditional comparative modelling approaches.
Cons:
- Can experience long wait times due to the high demand and computational intensity of deep learning methods.
- The accuracy can vary based on the availability and quality of templates or the effectiveness of the de novo modelling.
Training Method:
- It utilises the Rosetta software suite’s tools, combining methods from comparative (homology) modelling and de novo structure prediction.
- It has incorporated deep learning methods, specifically RoseTTAFold, which uses a three-track network for structure prediction.
Explore more about Robetta
8. HHPred
HHpred is a sophisticated bioinformatics tool for protein homology detection and structure prediction developed by the group at the Max Planck Institute for Developmental Biology. Here’s a concise summary of HHpred:
Creator: Max Planck Institute for Developmental Biology.
Pros:
- Implements pairwise comparison of profile hidden Markov models (HMMs), making it highly effective for detecting remote homologs.
- Capable of searching a vast range of databases including PDB, SCOP, Pfam, SMART, COGs, and CDD.
- HHpred is unique in its use of HMMs for both the query and database sequences, allowing for more sensitive detection of homologies than methods based on sequence-sequence comparisons.
- Provides detailed alignments and the option to predict 3D structures via MODELLER if a suitable template is found.
- Highly sensitive in detecting homology, even among distantly related proteins.
- Integrates with multiple databases and allows comprehensive analysis across different data sources.
Cons:
- While powerful, the complexity of its setup and the need for specific alignments may pose challenges for less experienced users.
Training Method
- Leverages profile-profile comparison methods, which are among the most sensitive sequence search techniques.
- Profiles are created from multiple sequence alignments of related sequences, enhancing the accuracy of homology detection.
Explore more about HHPred
9. AlphaFold-Multimmer
Creator: Google DeepMind
Pros:
- Built on AlphaFold2, it is engineered to tackle the complex prediction of protein-protein interactions, which involves understanding how multiple protein chains fit together.
- Unlike its predecessor, which primarily predicts the structure of individual protein chains, AlphaFold-Multimer predicts the inter-chain interactions and the arrangement of proteins in a complex.
- It achieves high accuracy in interface prediction, which is critical for functional analysis of proteins within their biological context.
- Increases the scope of computationally accessible protein structure prediction to include complex assemblies.
Cons:
- The computational demand is high, possibly limiting accessibility for some researchers without access to significant computing resources.
Training Method:
- AlphaFold-Multimer uses deep learning algorithms trained on publicly available data of known protein structures. This includes training specifically for multimeric inputs to enhance the accuracy of interface predictions between different protein chains.
Explore more about AlphaFold-Multimmer
10. CollabFold
Creator: Milot Mirdita, Sergey Ovchinnikov, and Martin Steinegger
Pros:
- Integrates AlphaFold2 and RoseTTAFold with MMseqs2 for fast multiple sequence alignment, significantly accelerating protein structure predictions.
- Operates as an easy-to-use, notebook-based environment on Google Colab, making advanced protein modeling accessible without requiring installation or high-end hardware.
- Capable of predicting close to a thousand structures per day with a single GPU
- Unlike standalone AlphaFold2 which requires more extensive computational resources, ColabFold optimizes resource use via Google Colab, making it accessible to a broader audience.
- Its integration with MMseqs2 speeds up the homology search process, making it much faster than traditional methods.
Training Method:
- Utilises existing trained models from AlphaFold2 and RoseTTAFold and combines them with MMseqs2 for rapid sequence alignment and improved prediction accuracy.
Cons:
- Dependent on Google Colab’s resources, which can limit the size of proteins analyzed due to memory constraints on available GPUs.
- While it offers significant speed and accessibility, the precision for extremely complex structures might still lag behind more resource-intensive setups.
Explore more about CollabFold