Embracing Limitations: How Assessment Informs Application

Over a year has now passed since DeepMind-EBI released 48 complete AlphaFold2 (AF2) proteomes.  While the CASP14 competition and the original Jumper et al. AF2 report provide ample validation to support the superior global performance of AF2, community assessment offers added context that better highlight strengths, limitations, and opportunities for this groundbreaking technology.  For one, Akdel et al. published their assessment of applications in Nature Structural and Molecular Biology last November, emphasizing its use in standard structural biology workflows.  In particular, AF2-predicted structures were broadly effective across most applications, such as predicting disordered residues, high-impact mutations, homomeric states, and heteromeric protein assemblies.  With each application, the authors demonstrate that performance is dependent on the AF2-reported confidence metrics.  They conclude the study by predicting a future trend of hybrid AI/experimental approaches in structural biology.

Roney & Ovchinnikov offer a different approach to AF2 assessment in their November 2022 article titled “State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold”.  The authors devised a clever experiment to understand why the model depended on coevolution data derived from input Multiple Sequence Alignments (MSAs), an observation made by Jumper et al. during AF2 development.  AF2 structure predictions require an input protein sequence, an optional MSA, and up to four optional template 3D protein structures.  Roney & Ovchinnikov performed structure prediction experiments comparing the use of real templates with the use of incorrect protein structure decoys, both in the absence of MSA data.  AF2 prediction confidence values dropped as decoys were progressively dissimilar to the native target structure.  The authors interpret these findings to suggest that AF2 has learned a general protein structure energy function and only requires the co-evolutionary data to narrow conformational space searches associated with structure prediction.   In this case, assessment directly informed the applicability of the technology to MSA-free applications, presumably protein engineering.

Twerilliger et al. assess AF2’s predictive performance for different components of a protein structure.  Their November 2022 preprint titled “AlphaFold predictions: great hypotheses but no match for experiment”, compares AF2 predictions to recently deposited x-ray crystal structures.  The authors mapped AF2-predicted structures onto the electron density maps of 102 unique, high-quality protein structures that were first deposited into the Protein Data Bank (PDB) following the AF2 model release.  While overall folds were generally well-predicted, domain-domain orientation distortions were more frequent. Importantly, they also report that predictions deviate from experimental structures more so than experimental structures differ from one another. Secondly, the authors devised a side-chain grafting strategy, whereby AF2 predicted side-chain conformations were swapped onto their corresponding backbones in the experimental structures.  The grafting experiment found that 20% of side chains found in moderate-to-high confidence residues were found in different conformations from the crystal structure.  This may even underestimate the impact of error when it comes to modeling inter-molecular interactions, including small-molecule drug discovery.  While the authors do not distinguish between surface residues and buried residues, we expect higher levels of error at protein surfaces, including ligand binding sites.  

This dataset was a great opportunity for us to model the expected performance compromise that we could expect when using predicted structures rather than experimental ones.  Cyclica conducted our own empirical analysis to measure the impact of substituting X-ray crystal structures for AF2 models in our own workflows.  Based on the original 102 structures, we were able to find 13 proteins that had one or more bound ligands that were suitable for docking [7fjg, 7ety, 7tzp, 7trw,  7vnx, 7dqx, 7f2a, 7rc2, 7wnn, 7v1q, 7w3s, 7edc, 7t7j], for a total of 31 total ligand-bound sites.  Ligand binding sites from the experimental structures were mapped onto their corresponding locations on the AF2 structures.  In Fig 1. we show one such ligand binding site where side chain orientation differs substantially between the AF2 structure (AF_P06560_F1_v4) and its corresponding PDB entry 7ety. 


Fig.1  While AF2 predictions are great at determining protein topology, atomistic-resolution detail is not always reliable.  Since structures are predicted in the absence of known ligands, AF2 side chain orientation (white; AF_P06560_F1_v4) and experimental pocket configurations (yellow; 7ety) can vary substantially, posing a challenge for SBDD techniques dependent on specific atomist coordinates, such as rigid docking.

The co-crystal ligands were then docked into both pockets.  The same ligand + pocket pairs were also evaluated with MatchMaker™.  We then normalized each set of scores and reported the Pearson correlation (r-squared) of corresponding predictions using structures from the different sources in Fig. 2.  In short, this analysis mimics the act of a simple substitution of AF2 for x-ray structures, with higher correlation indicating a lesser impact. 


Fig 2. Docking and MatchMaker™ score evaluating known protein+ligand pairs when using experimentally-derived protein structures vs AF2-predicted structures.  MatchMaker™ is less susceptible to structural inaccuracies, including side chain conformation errors, associated with AF2-predicted protein structures.

Molecular docking approaches are very sensitive to side-chain conformations, leading to a poor correlation between when docking true experimental structures vs their AF2-predicted counterparts. Based on this sanity check, however, it is evident some AF2 protein structures will occasionally behave like their experimentally-determined counterparts when applying industry-standard structure-based drug design (SBDD) tools.  This is really no surprise as homology models and apo structures have long been used in virtual screening workflows, albeit with lower consistency (ie. success rate).  On that topic, Clark et al. 2019 provide an excellent analysis of measurable structural differences observed in the experimental structures of ligand-bound and unbound targets.

On the other hand, Fig. 2 clearly shows that Cyclica’s MatchMaker™ model is more tolerant of this form of error.  Our ML-based solution to predict drug-target interactions (DTIs) was originally developed with the goal of proteome screening applications and required consistent performance from target to target.  Given the structural coverage of the proteome was relatively low at the time, we designed our models around the use of homology models, which were known to have inaccurate side-chain conformations.  Specifically, we opted for a pose-independent strategy to learn pocket-ligand relationships.  Removing steric information wasn’t much of a compromise for the predicted protein structure and further allowed us to train on millions of protein-ligand interactions without experimental co-complex structures, once we could reliably assign each interaction to a pocket.  Our recently released NodeCoder, is another such example of an AlphaFold2 application that operates on residue contact networks (ie overall topology) rather than side chain conformations for performing functional inference.

Engineering solutions to operate within the margin of error may seem like an obvious piece of advice under most circumstances.  However, this principle is often overlooked in structural biology since atomic coordinates don’t come with error bars and existing uncertainty metrics (pLDDT, B-factor, etc) don’t tell the whole story.  In our case, prioritizing the use of predicted protein structures on model training and inference, led to the development of a generalizable DTI model that is effective and consistent with low data targets.  Cyclica’s neo-biotech portfolio is built on this scalable framework and is fueled by predicted structures and their imperfections!

Related Posts

Protein residue characterization using AlphaFold

In this post of our AlphaFold Series, we present the new technology developed at Cyclica Inc. in...


The Protein Universe:  Structural Biology Has Entered Its Phylogenetic Era

Over the course of the past year, Deepmind+EBI released a complete set of protein structure models...


Can AlphaFold save lives in epidemics?

Recently, Google DeepMind’s AlphaFold again made news, having overwhelmingly won the latest CASP...