
Author: Claire Brotea
Editor: Altay Shaw
Has AlphaFold finally solved the problem biochemists have been trying to solve for 50 years?
Demis Hassabis, John Jumper, and David Baker seem to have done just that. Following the 13th biannual Critical Assessment of Protein Structure Prediction (CASP) competition, these scientists were able to use an AI system, AlphaFold2, they had developed to predict the 3D structure of a protein to nearly 90% accuracy in 2020 in just minutes. For 50 years, biochemists have grappled with various techniques, such as x-ray crystallography, to achieve just a portion of this accuracy. These groundbreaking results led to AlphaFold2 being able to predict the structure of 2 million proteins, and now it seems that compared to only the 200,000 protein structures predicted from traditional methods, AlphaFold has a much more promising future, especially in fields such as drug development.
AlphaFold2 is based on the biological principle of the conservation of protein structures in evolution. After AlphaFold1 was released by Demis Hassabis, John Jumper was able to improve this AI system to become AlphaFold2, which works by finding patterns of amino acid sequences similar to the imputed amino acid sequence. Amino acids are the building blocks of proteins, and when put in an amino acid chain, their subsequent chemical properties (such as charge and size) determine the structure of a protein as a whole. These patterns that AlphaFold2 recognises are saved, and the AI system can gather huge amounts of stored data, so that each time it predicts a new structure, it has more data to base its prediction on. This is a form of homologous modelling, which is a method based on homologous (similar) structures in proteins. This takes mere minutes to use to predict a structure, solving our 50-year-long problem.
Apart from its time-consuming nature, x-ray crystallography has only been able to determine the structure for 0.1% of all existing proteins, a much smaller range than what AlphaFold2 has accomplished in just a few years. This is because using a crystalline structure to view a biomolecule’s structure is difficult, as some proteins are normally bound to membranes and therefore have high flexibility when they are not. In general, AlphaFold2 takes into account a protein’s native state, whereas x-ray crystallography does not. While AlphaFold 2 cannot fully replace x-ray crystallography, as this is the method that was able to determine protein structures used to train this AI system’s pattern recognition method, it can be paired with it in studies to increase the accuracy of a prediction of a protein.
Aside from being extremely useful in research, AlphaFold2 has real-life applications in the scientific field. Specifically, scientists have even been able to use it in the development of a new vaccine against the SARS-Covid virus’ many variants. Because the number of in-vivo proteins is limited, AlphaFold2 has been able to help researchers develop new proteins, based on a specific structure they need to target with other proteins, like antibodies. Researchers were able to find 7 main mutations on the spike protein receptor binding domain that have been driving the rapidly mutating virus to be able to infect so many people in the pandemic. Knowing where the mutations are occurring has allowed researchers to create a protein that can effectively neutralise the virus by binding to the spike protein where the spike protein would normally bind to receptors on a host cell, essentially blocking infection.
While AlphaFold has been able to further spread the knowledge of protein folding and structure predictions (more than 2 million people from 190 countries have accessed AlphaFold2 since it’s been made publicly available), there are still a ways to go. As mentioned previously, AlphaFold2 bases its predictions on other known structures, essentially memorising other structures to predict a new one. Therefore, interactions between amino acids in a sequence, interactions between amino acids and the external environment, and the energetics of the structural folding of the protein are not really taken into account by the AI system when predicting its 3D structure. So if an amino acid sequence of a protein is inputted into the system, and the AI system doesn’t recognise a similar sequence, the prediction of the structure will not be very accurate. This is a significant setback, especially when considering proteins that switch their folding spontaneously (e.g., they have two energetically favourable folding states, and not just one 3D conformation). When considering trying to determine the effect a missense mutation has on a protein structure (when a single amino acid is substituted for another one), AlphaFold2 theoretically would have a hard time determining the structure. Therefore, researchers have been asking the question of whether or not AlphaFold2 would be sensitive enough to those small changes in chemical properties of amino acid structures, and it would just look for a similar enough amino acid sequence in its database to base its prediction off of.
Moreover, this homologous modelling method means that the quality of the scoring (how well AlphaFold2 can distinguish between a good and a poor prediction on its part) cannot be tested. Each time AlphaFold2 gives a predicted structure, it also gives the probability that this prediction is accurate. But theoretically, if AlphaFold2 predicted a structure with a low accuracy, it would still store this structure for that amino acid sequence in its database and use it for future predictions. Therefore, we must still rely on X-ray crystallography, cryo-EM and NMR to determine structure, but then the efficiency and time-saving approach does not make sense anymore. Additionally, these methods could show a different structure than predicted by AlphaFold2 due to processes such as post-translational modifications that happen after a protein has been made. For instance, in our genetic code, the amino acid proline exists in the sequence for collagen, but after collagen is produced, proline is turned into hydroxyproline, a different amino acid with different chemical properties made to make collagen stronger. Therefore, the structure determined by AlphaFold2 would be slightly different to the structure predicted by x-ray crystallography.
AlphaFold has indeed revolutionised biochemistry by leveraging advanced AI technology and biological principles to surpass traditional models with its unprecedented speed, accuracy, and efficiency to further fields of drug development and protein research. While this is a monumental leap forward for not just the Nobel Prize winners but scientists around the globe, this AI model’s reliance on homologous modelling proves a couple disadvantages, mainly surrounding not being able to capture a protein structure’s full integrity in its native state. Regardless of its disadvantages, with more time to refine this model, scientists will soon be able to use AlphaFold and other models to create more precise and comprehensive protein structure predictions, ultimately creating a promising future.
