AlphaFold: A Year in Review

Over the last year, we here at Cyclica have closely monitored and published a series of blog posts on AlphaFold, the revolutionary deep learning technology from Google’s DeepMind that predicts the three-dimensional structure of protein molecules. AlphaFold was recently named method of the year 2021 by Nature. We started out with a perspective in December 2020 that delved into the history of protein structure prediction and suggested a role for it to play in epidemic preparedness. We then started a series of posts on our experiences in integrating AlphaFold into our AI platform right after DeepMind published the full proteome structures of 20 organisms, including human, in July 2021. Here, in the last post of this series, we would like to put our other posts in perspective and comment on the future outlook for the AlphaFold technology and how it may impact drug discovery at Cyclica.

The main attraction of AlphaFold is that it can predict the structure of any natural protein given only its amino acid sequence. In July 2021, DeepMind announced that is would work with the 

European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) to predict all structures within the proteomes of 21 organisms and make them available as a free and open resource named AlphaFold Protein Structure Database (AlphaFold DB). The intention is to extend this database to eventually cover all proteins contained in the Uniprot database.

If these plans are realized, AlphaFold DB will feature a >2000-fold increase in structural coverage of the known protein sequences and a >700-fold increase in the number of structures, compared with the RCSB PDB protein structure database, the previous goto source. Such an increase of three orders of magnitude is generally thought to represent a paradigm shift in structural biology (Sriram Subramaniam and Gerard J. Kleywegt, Nature Methods). 

Since the human proteome is already well-covered by the PDB, the greatest impact of the AlphaFold will come from the sudden availability of protein structures of other species. Any kind of proteome-wide analysis, such as our MatchMaker drug target interaction predictor, requires substantial coverage of the target proteome, which, before AlphaFold, really only existed for homo sapiens. AlphaFold DB will provide structures for the full proteomes of thousands of species. Furthermore, AlphaFold, the software, can quickly generate the structural proteomes of other, perhaps newly discovered or newly sequenced species. As we wrote back in December 2020:

"Of particular interest in the field of infectious disease are the proteomes of human pathogens. When a new pathogen is discovered, one of the most important tasks is to sequence its genome, both to identify its relation to other known pathogens and to obtain crucial information about its inner workings to be used for vaccines and therapies. Between AlphaFold and MatchMaker, we have, in principle, all the tools in hand to mount a rapid therapeutic response to new epidemics: 1) isolate pathogen and sequence its genome, 2) compute 3d-structure of its proteins (AlphaFold), 3) Identify promising compounds for therapy, ideally existing drugs that can be repurposed by Cyclica (Ligand Design), and 4) clinical trials in the field. We estimate that the first 3 steps of this could be completed within weeks, allowing the emergence of therapies in the early stages of an epidemic and decreasing the chance of it turning into a pandemic substantially."

To their credit, DeepMind has published the AlphaFold inference code as open source, and has also made model parameters available. However, the inference code does not allow creating or tweaking of models, and the provided model parameters are licensed for non-commercial use only. In August 2021, Baek et al. from the Baker Lab at the University of Washington published their own implementation of a deep learning model for protein folding, named RoseTTAFold. RoseTTAFold is inspired by AlphaFold and predicts protein structures to similar accuracy, but the two methods are nevertheless significantly different. Again, the training code is not made available and model parameters are only shared for non-commercial use. 

Given the groundbreaking nature of this advance and the detailed descriptions of both methodologies in the scientific literature, it is safe to assume that this will not be the last word, and that it will soon be possible for any researcher to train models and generate new and improved model parameters very soon. There are many gaps left to be filled, some of which are outlined in our blogs and by Subramaniam et al. We expect a vibrant ecosystem of mostly open software to develop around protein structure prediction, aimed at filling gaps and pushing the boundaries further in terms of performance and coverage. We are excited about participating in this endeavor.

Dr. Andreas Windemuth, Chief Innovation Officer

Dr. Andreas Windemuth, Chief Innovation Officer

Andreas is the Chief Innovation Officer, and guides Cyclica's vision in creating a scientifically rigorous platform that's integral in the drug discovery pipeline.

Related Posts

Can AlphaFold save lives in epidemics?

Recently, Google DeepMind’s AlphaFold again made news, having overwhelmingly won the latest CASP...


A closer look at the coverage of the AlphaFold Human Proteome

In this post of our AlphaFold Series, we look at the gaps in proteome coverage addressed by...


The Advent of In Silico Polypharmacology

In February 2017, we released a special perspective on the history of computational drug...