Pitfalls in AI Drug Discovery

Artificial Intelligence (AI), or more modestly machine learning, is increasingly being used in many areas of drug discovery. There are applications for finding molecular targets, for finding active molecules, for developing such molecules into drugs, and for performing preclinical and clinical testing of such drugs before they are ready to be prescribed to patients. Here, we want to focus on potential pitfalls in perhaps the most fundamental aspect of drug discovery, the identification of active molecules for a given target, also known as hit identification.

Fundamentally, a “hit” is a molecule that interacts with a target (usually a protein) and has some activity, i.e. interferes with its natural function. Commonly, high throughput screening (HTS) is used to find hits by brute force, i.e. try a large number of arbitrary molecules in the laboratory using an assay until one or more are found that demonstrate the desired activity. This is done in highly automated, capital intensive HTS facilities, at great expense in time and money.

The promise of AI for hit identification is that a machine learning model can predict, to some degree, which molecules might be active, and thereby reduce the number of molecules that need to be screened in the lab from millions to hundreds, eliminating the need for an HTS facility and greatly reducing the time and cost of hit identification.

A large variety of approaches exist to train machine learning models that can predict molecular activity. Almost all companies in the space use ligand-based models, which rely on existing data for known hit molecules of a specific target to predict the activity of new molecules on the same target. The obvious pitfall here is the chicken and egg problem: You need existing hits to find new ones. That limits the approach to well-known targets that, in most cases, will already have one or more drugs on the market. Such “best in class” drugs can be lucrative, but the real opportunity for medical advancement lies with “first in class” drugs, i.e. drugs for targets and indications that lack existing treatments..


Figure 1: MatchMaker is a deep learning classifier that predicts whether any given drug-target pair interacts.

For this reason, it is highly desirable to be able to find hits for low data targets, i.e. targets for which none or few known active molecules exist. This has led to the development of chemoproteomic models, which aim to predict molecular activity not for one specific target, but for any protein in the proteome. One such model, Cyclica’s MatchMaker, is trained on a large dataset of drug target interactions (DTI) from assay data collected in the PubChem and Chembl databases, as well as DTIs collected from the patent literature. Because data is collected on many proteins, the training set is large, encompassing many millions of data points. Such large amounts of data enable the use of deep neural networks which can generalize from target and ligand properties and predict activity for targets and ligands that do not occur in the training set. This is essential for low data targets and enables the discovery of first in class therapies. 

Here, we would like to dig a little deeper into the pitfalls associated with the development and testing of machine learning models predicting DTIs. There are important issues related to data quality, chemistry exploration, data bias and cross-validation testing, discussed in more detail in the paragraphs below.

Data quality As with any machine learning model, data quality is of great importance. This is true even more so for large data sets collected from many different sources, like the one used for MatchMaker. DTI data often comes with affinity, which can be measured in various ways such as IC-50, EC-50, Kd, and many others. Combining all these into one meaningful affinity value is difficult. This is one reason MatchMaker is a classification model, i.e. the affinity measurement is only used to determine whether the molecule binds or not, eliminating a lot, but not all, of the affinity labeling issues. Another way in which Cyclica controls data quality is by careful filtering and a method we developed called Filtered Transfer Learning (FTL)

Bias Like data quality, biases in the training set are widely recognized as a pitfall in all machine learning applications. The performance of machine learning models is normally assessed in cross-validation, where a portion of the training data is left out of training and used to test the accuracy of prediction. Bias can cause this assessment to be overly optimistic, leading to unrealistic expectations of a model that do not bear out in reality. Although we can have an estimate on some of the biases that might exist in our data and models, there is no substitute for real world testing to truly validate an AI model, as unknown bias may be present despite best measures to take all known bias into account. Biases relevant to chemoproteomic models include overrepresentation of high-data proteins, chemical series bias, and negative control bias. We have discussed some of these biases in previous articles.


Having a predictive model is not enough to find hit molecules. There are several different approaches to select or generate molecules that score highly on a predictive model. The pitfall here is that the model usually does not have a good understanding about what a realistic molecule is and might favor molecules that are impossible to make or even nonsensical representations of molecules that cannot exist. This is an often cited drawback of using AI for drug discovery. The easiest solution is the screening approach, where the model is used to score a finite, but large, set of virtual molecules that are known to be easily synthesizable, or even already exist in reality. MatchMaker has a very high computational efficiency for predictions, and enables Cyclica to screen such libraries of billions of “sensible” molecules, thus avoiding the synthesizability pitfall associated with fully generative approaches. 

As we have seen, there are a large number of pitfalls and cautions associated with the use of AI in drug discovery, and we have really only been able to touch on some of them. However, with the right amount of care, none of these are showstoppers, and they can be overcome by properly understanding and mitigating them. The final arbiter is real world performance. At Cyclica, we have used MatchMaker to find hit molecules successfully in dozens of drug discovery programs, giving us confidence that the model is predictive and useful for the task it was designed for.


Dr. Andreas Windemuth, Chief Innovation Officer

Dr. Andreas Windemuth, Chief Innovation Officer

Andreas is the Chief Innovation Officer, and guides Cyclica's vision in creating a scientifically rigorous platform that's integral in the drug discovery pipeline.

Related Posts

Designing drugs with code: Are we there yet?

Earlier this spring I attended my first in person conference in over two years, the national...


The need for speed: how drug target discovery can drive change in precision medicine

Employee Perspective: Cyclica’s Marketing and Communications Specialist, Rebecca Woelfle


Flipping the (Drug Discovery) Problem on its Head with Polypharmacology

For context: medicines have traditionally been designed to target a single protein with high...