Can AlphaFold save lives in epidemics?

Recently, Google DeepMind’s AlphaFold again made news, having overwhelmingly won the latest CASP competition for the prediction of protein structures. After winning the last competition in 2018 with a sizable margin, this time an updated version cleaned up the field so thoroughly that experts are heard saying that the protein folding problem is now solved, a grand challenge that has been worked on for decades and many had considered unsolvable. However, at closer inspection, it turns out that the goalposts for protein folding have shifted considerably in the last decade. What DeepFold has “solved” is more accurately described as protein structure prediction, leaving the original goal of ab initio modelling of the protein folding process all but abandoned.

In the 1950s, Christian Anfinsen showed that a polypeptide chain could fold into a defined 3-dimensional structure all on its own in vitro. In 1969, Cyrus Levinthal pointed out that the astronomically large number of possible conformations of a polypeptide makes finding the energetically best structure in a limited time impossible (the Levinthal Paradox), and that there must be some sort of directed pathway from unfolded to the folded structure. Since then, until about a decade ago, the protein folding problem has been understood to computationally model this pathway ab initio and arrive at the correct structure of the protein from its peptide sequence alone.

In the past decade, it became apparent that the genetic evolution of proteins can be mined for information useful to predict their structures. By observing correlations between mutations of pairs of amino acid residues between related species, residue pairs that are in contact in the protein structure can be identified, providing a data set of residue distances. These data are very similar to those that can be obtained by experimental NMR measurements, but they arguably still qualify for the label “from sequence alone”. This has led to a nearly complete abandonment of the pure idea of ab initio protein folding in favor of modelling the structure of proteins using these evolutionary constraints. It is that problem that AlphaFold has now (arguably) solved, not the original Anfinsen/Levinthal idea of modelling the folding process from first principles alone.

In practice, the above distinction is not important, since evolutionary sequence data is readily available today for any protein of any species, and the success of AlphaFold in predicting the 3d-structure of proteins from this data is nothing less than spectacular. Thus, it would seem, protein 3d-structure prediction is being revolutionized by deep learning in a way very similar to how AlphaGo, also from DeepMind, conquered the game of Go, along with similarly transformational advances in image recognition and natural language processing.

At Cyclica, we have observed similarly transformational improvements in the prediction of drug/target interactions after introducing our proteome-wide MatchMaker deep learning model. MatchMaker is trained on the structures of all human proteins and millions of experimentally known small molecule/protein interactions available from databases such as CHEMBL. The model is much more accurate than conventional molecular docking, and also many orders of magnitude faster, computationally. We have used MatchMaker as part of our Ligand Design platform to successfully identify active new chemical entities about 60% of the time across dozens of drug discovery programs across a wide range of therapeutic areas. Because MatchMaker is trained with 3d-structure data and across the entire proteome, Ligand Design can address novel and difficult targets and identify first-in-class compounds where little or no chemical matter exists.

While we have worked on a number of programs where we prosecuted targets for non-human species, MatchMaker was initially trained and applied to the human proteome. While it is able to generalize enough to predict binding for non-human proteins as well, a proteome-wide analysis in non-human species is dependent on the availability of the 3d-structures of most of the proteins in the proteome. Structural coverage of non-human proteins varies widely and is generally inferior to that in humans. MatchMaker works really well with homology models, and proteome-wide sets of homology models are available for dozens of species at SwissModel. We’ve integrated these models along with a proprietary set of high-quality homology models. Even then, many important species are not included and would have to be generated in a lengthy and expensive process. Conceivably, AlphaFold could make this process much more manageable and effective and enable the routine generation of proteome-wide 3d-structure collections.

Of particular interest in the field of infectious disease are the proteomes of human pathogens. When a new pathogen is discovered, one of the most important tasks is to sequence its genome, both to identify it’s relation to other known pathogens and to obtain crucial information about its inner workings to be used for vaccines and therapies. Between AlphaFold and MatchMaker, we have, in principle, all the tools in hand to mount a rapid therapeutic response to new epidemics: 1) isolate pathogen and sequence its genome, 2) compute 3d-structure of its proteins (AlphaFold), 3) Identify promising compounds for therapy, ideally existing drugs that can be repurposed (Ligand Design), and 4) clinical trials in the field. We estimate that the first 3 steps of this could be completed within weeks, allowing the emergence of therapies in the early stages of an epidemic and decreasing the chance of it turning into a pandemic substantially.

Predicting novel protein structures from sequences alone, as AlphaFold has demonstrated, opens up a wealth of therapeutic opportunities for new drug discovery. It must be noted, though, that much more is required to bring a drug to the market, limiting the overall impact of any one innovation. AlphaFold can fill an important role in providing the input for a structure-focused drug design platform like Ligand Design, capable of designing molecules to interact with the resulting target structures. Perhaps the biggest impact can be expected in a rapid epidemic response drug repurposing program as described above, which is a real possibility if the right partners get together to make it happen.

Naheed Kurji, Co-founder, President and CEO
Andreas Windemuth, Chief Science Officer
With thanks to our awesome team!

Naheed Kurji, Chief Executive Officer

Naheed Kurji, Chief Executive Officer

Naheed Kurji is the Co-Founder, President and CEO of Cyclica. Naheed is passionate about building AI-augmented technologies that enable researchers to make more strategic and informed decisions in Healthcare and the life sciences. He spends the majority of his time obsessing over Cyclica’s culture, defining its strategy to best effect change in the pharma industry to achieve the company’s vision, and exploring opportunities for continued innovation.

Related Posts

A closer look at the coverage of the AlphaFold Human Proteome

In this post of our AlphaFold Series, we look at the gaps in proteome coverage addressed by...


AlphaFold2 coverage on multiple proteomes and impact on training MatchMaker

In the next post of our AlphaFold Series, we examined AlphaFold Human Proteome coverage, broken...


Protein residue characterization using AlphaFold

In this post of our AlphaFold Series, we present the new technology developed at Cyclica Inc. in...