When doing supervised machine learning, we would like to identify a relationship between some features and one or more output values associated with every data point. In the case of classification, the output value(s) are categorical, which we call data point labels. However, these labels may not all have the same level of validity (ie. Our confidence in these labels is variable). Some of the commonly used labelling processes for data points are conducted by averaging experimental measurements (e.g. biological or chemical contexts) or by using the annotations of multiple experts (or non-experts). Historically this has its roots in Condorcet’s jury theorem presented by the French Mathematician (Marie Jean Antoine Nicolas de Caritat; 1743–1794). Based on this theorem, if each jury member has an equal and independent chance, better than random, but worse than perfect, of making a correct judgment on whether a defendant is guilty, the majority of jurors are more likely to be correct than each juror. The probability of a proper majority judgment approaches one as the jury size increases (https://plato.stanford.edu/entries/social-choice/). When we rely on the wisdom of crowds or multiple measurements (or even multiple measurement techniques), the labels assigned to the data points can have different confidence levels based on the level of agreement between the human labellers or numerous measurements. The critical question is, should we only use the data points with high label confidence?

**We can still learn from disagreement.**

Although we may prefer to rely on high confidence labels, there is still knowledge in the disagreement between labelling strategies, which results in low confidence labels. Hence, we may benefit from learning across the label confidence distribution (data points with different label confidence levels). There are two classic approaches to achieve this goal:

**Ensemble learning**: In ensemble learning, we build multiple models and then consider the votes of all the models to come up with a prediction of the label of each data point. We can use the same approach by building multiple models across the label confidence distribution. Although this approach can result in better performance than models trained on high confidence data points alone, it may not be feasible to be used in an industrial setting. These models rely on training multiple models to predict the label of a new data point, which is computationally very expensive.

**Assigning weights: **Confidence-based weight-assignment in the optimization process is another approach that can be used to learn from data points with different confidence levels. Despite successes of this method, some systems may be constrained from using weight-based strategies, such as those with discrete confidence tiers rather than continuous probabilities, systems with only partial confidence assignments, or systems using simulated random negatives. Drug-target interaction prediction is among such problems facing the issue of simulated random negative data points.

We presented an alternative approach called Filtered Transfer Learning (FTL) that does not have the computational cost issue of ensemble learning and can be used in systems incompatible with weight-based strategies.

**Filtered Transfer Learning (FTL)**

We developed Filtered Transfer Learning (FTL), a technique relying on the concept of transfer learning. In transfer learning, a model (like a deep learning model) is first trained on a reference task (typically with much more data points) and then fine-tuned on a minor task to develop the specific model. In FTL, a neural network model is first trained on data points with different confidence levels. The lower confidence data points are then filtered in a stepwise manner before retraining. Eventually, the model is trained on the highest confidence data points and then used to predict the label of new data points (like those in the test set). We implemented this technique for predicting drug-target interaction using the STITCH dataset and showed high performance of this technique (available in our preprint: **https://arxiv.org/abs/2006.02528**).

**Beyond drug-target interaction**

Dealing with data points with different confidence levels is not limited to the drug-target interaction problem. There are many problems, within and outside of healthcare and pharmaceutical industries, that can benefit from the FTL technique, such as:

**Radiological or histopathological images or image segment labels**: Radiological and histopathological Images (or image parts) are labelled by experts within hospitals and healthcare settings. Although their labelling can be very accurate, the image labels’ confidence is variable.
**Crowdsourced image annotation**: Many of the image datasets containing images of animals, cars, etc., are labelled based on the wisdom of crowds resulting in images with different label confidence.
**Resistance to drugs**: Although patients (or model systems) can be categorized as resistant and sensitive to drugs, these categorizations rely on continuous measurements. Hence, considering a threshold to categorize the data points as susceptible or resistant to drugs results in low confidence (data points closer to the threshold) and high confidence (data points further away from the threshold) data points.

Editor: Andreas Windemuth & Chinmaya Sadangi

Image adapted from "Learning across label confidence distributions using Filtered Transfer Learning" (https://ieeexplore.ieee.org/document/9356262)