
Deep Mind’s Alpha Fold II predicts 3D protein structure with an astonishing accuracy
Writer: Ania Lewicka
Editor: Gracie Enticknap
Artist: Lia Bote
Pushing forward the boundaries of science can be a painstaking process and many hard-working, talented scientists do not get to celebrate their success in the spotlight. Nevertheless, once in a while something so important happens that not only the scientific community is moving, but the whole world talks about it. The breakthrough discovery of Alpha Fold II was definitely on everyone’s lips. The headlines proclaimed that the experimental scientists are becoming redundant as now every structure will be predicted by artificial intelligence (AI). Sceptics announced it as an existential threat, believing Alpha Fold II to be another proof that human intelligence is inferior to AI. Let’s explore the story behind this stir.
Solving the unsolvable
While the experts announced it as a revolutionary step, many people still fail to understand why it is so crucial to be able to solve the protein folding paradox. In fact, the majority of people never have even heard about this conundrum. To truly appreciate the scientific advancements made by this discovery, it is necessary to understand the complexity of the protein folding process and why it was called a ‘paradox’. The sequence of amino acids coded in the genes dictates the three-dimensional shape of proteins. To be more precise, the shape is determined by a network of chemical interactions between the protein’s components and its environment. The 3D configuration of the protein is crucial as it determines its functionality. This is precisely why it’s so crucial for scientists to understand how proteins acquire their shapes- to figure out how diseases come about and how they can potentially be cured. The paradox lies in the theory that the molecule has an astronomical number of possible conformations (estimated 10^300 possibilities for a big protein), yet in nature, the process happens spontaneously in milliseconds. In the 20th century, it was deemed unsolvable by the means of human knowledge and ability, but lucky as we are nowadays, we have not only human intelligence to rely on- we also have Artificial Intelligence.
The Olympics of Protein Folding
Humans are competitive in nature, and this was precisely the trait the scientific community decided to use to crack the protein folding mystery. In 1994, an ongoing competition called ‘Critical Assessment of Protein Structure Prediction (CASP) was established in order to accelerate the progress in this area of research. Participants are required to computationally predict the structure of newly discovered proteins based solely on their amino acid sequence, and these predictions are subsequently compared to the ground truth experimental data when they become available. While initially the progress in the competition was slow, the results became much more accurate once the researchers started to use AI, or to be more precise, deep learning. Nevertheless, before 2018 no team has ever scored more than 40 points in a Global Distance Test, a measure of accuracy. The revolution, happened in 2020, with Alpha Fold II scoring almost 90 points on GDT. It used a completely new approach- a neural-network-based algorithm trained on a public dataset of 170,000 proteins with known structures (so-called labelled data), and a larger database of proteins with unknown structures (unlabelled data). This neural network system uses multiple sequence alignment (MSA) to interpret the spatial graph of a folded protein, which is crucial to understand physical interactions within it as well as its evolutionary history. As a result, AI develops accurate predictions about the principal physical structure of a protein using reliable statistical tools such as internal confidence scores.
A recipe for a revolution
To fully grasp the reason behind the massive excitement of the scientific community after this discovery, it is necessary to realise the huge impact this discovery will likely have on various aspects of science.
First of all, replacing experimental methods used to discover protein structures (X-ray crystallography, NMR-spectrometry and recently invented cryo-electron microscopy) with computational ones is so beneficial as the former are labour-intensive, extremely time-consuming and require the use of extremely expensive, specialised equipment. What’s even worse, for some types of proteins, such as membrane proteins, these techniques don’t work at all.
Hopefully, in the future scientists will be able to rely on AI methods to directly predict the structures quickly and efficiently after the experimental work confirms algorithm predictions. Artificial intelligence will become a tool with the potential to accelerate the development of understanding of known diseases and model-making of unknown ones, significantly speeding up the process of drug discovery. The design of new, effective drugs, vaccines, and antibodies for specific types of diseases such as cancer will be produced significantly faster than currently.
What’s most exciting, Alpha Fold II is already having a significant real-world impact, aiding in the fight against today’s most pressing emergency: the COVID-19 pandemic. The program was used in London’s Francis Crick Institute to predict structures of SARS-Cov2 proteins that hadn’t yet been determined experimentally. The structures of viral proteins ORF3a and ORF8 determined by the program turned out to be impressively similar to the conformations experimentally deduced in the lab through cryo-EM. ORF3a is believed to have an important role in viral survival as it assists the virus in breaking out of the host cell after its replication, therefore it could be a potential target of the drugs.
Quitting the lab still not in sight
Does all this mean that the experimental scientists are becoming redundant and we can readily predict the structure of all proteins now? The answers to these questions are without a doubt negative, as there are still many protein structures the Alpha Fold II cannot deduce well. The predictions of one-third of the examined proteins were still less accurate than an acceptable boundary. More importantly, most proteins consist of several polypeptide chains forming a complex quaternary structure, which often require some non-protein component, even inorganic (for instance the iron cation in haemoglobin) to function. Therefore, not everything can be deduced solely from the sequence of amino acids.
Another thing to consider is the accuracy of the prediction- an average error was 1.6 Angstrom. Such precision (which approximately represents a size of an atom) is impressive, but in some cases, even such a small error makes a huge difference. An example of such an area is molecular docking, the basis of drug action, which requires the atomic positions to be accurate within a 0.3 Å margin. It suggests achieving a perfect fit with AI-designed proteins might be challenging.
Finally, Alpha Fold II will need to face general problems that the usage of AI incur- it can only extract patterns from data that they have been trained on, which means data has to be collected in the first place. This implies that the experiments will have to continue to be conducted to prevent entirely new functions that are not present in the currently existing datasets to remain undiscovered.
All things considered, an honest answer to the question of whether the protein folding paradox has been solved is: not really. As many skeptics pointed out, we cannot consider it solved as the mechanism of how nature manages to fold proteins within milliseconds was not explained and rationalized. Apart from that, we’re still far from applying AI in all cases of protein structure prediction. Nevertheless, the huge real-world impact of this discovery needs to be appreciated. Biologists and chemists gained a valuable working tool, thanks to which predicting the structure of many proteins will stop being a tedious and inaccurate process. It unlocks a fantastic new opportunity for a revolution in science.