We don’t need to know everything for decision-making. When doctors assess if a patient is obese or...
Machine Learning: Learning beyond living things
Industrial revolutions throughout history can be defined as pivotal stages in an advancing society, driven by technological advances, such as new machines and systems to improve the quality of life for human beings and their interaction with the environment. The invention of computers and consequently software development resulted in more advancement, specifically toward automation of task accomplishment.
Machines were developed to do specific tasks in a way that we wanted them to do. When we push a gas pedal, the car will go faster, except if it is broken. When we press a button on a keyboard while writing a blog post, we expect a specific character to be written on our document. We wanted to get those tasks done faster, easier, and with less danger and machines helped us do so. But, could we expect more from machines?
We know that there are limitations in our understanding of the world. These limitations impose a restriction on how we can design machine systems directly based on heuristics or first principles. What if we design new systems to figure out an optimal way of accomplishing a given task? What if we let those systems provide us with a new understanding of the world around us?
What is machine learning?
Machine Learning (ML) models learn how to accomplish tasks without dictated rules and provide us with a new understanding of those tasks. This decision-making is done in a learning process. So let’s take a step back and define what we mean by learning here.
The process of “learning” in ML modelling is conceptual, not precisely, similar to how we know things. The human-designed ML algorithms read through the data at once or piece by piece and learn datapoint-datapoint, feature-feature or feature-datapoint relationships. Logically, having better data and better algorithms results in better machine learning models. However, how do we define “better” in this context? Data improvements are generally easier to define. Data improvements may include: having more training examples, more diversity in training examples, data with higher information content, or increased confidence in the feature and observation values. For example, a machine learning algorithm trained to distinguish between dog breeds could benefit from more photos, photos of more individual dogs from each breed, high-quality photos with better poses, or more detailed annotations linking each photo to their respective bread. However, it is harder to define a better algorithm in a general way as the performance of algorithms depends on the task and data at hand and our tolerance for different types of errors. Sometimes we may need to make our models more complicated to find complex relationships between the features and observations, like in drug-target interaction (DTI) prediction.
Note. While machines generally invoke the imagery of physical hardware in our minds, machine learning algorithms are software code and machine learning models are a form of data. While the computers using the software are indeed “machines”, the practice of developing machine learning models involves coding or using the necessary algorithms programmatically to learn patterns from a specific dataset and then building software to use the models effectively or integrate with hardware, such as autonomous vehicles.
Supervised learning is one category of machine learning focusing on identifying relationships between features and observations (i.e. outputs). Depending on the type of task and data at hand there are different kinds of supervised learning models. Regression is the name given to ML systems that predict continuous, numerical values, such as the molecular concentration necessary for a chemical reaction. Alternatively, classification tasks predict categorical outputs, for example, if a chemical reaction will occur between two reagents. Based on the examples provided in the last two statements, it is clear that sometimes we can use both regression and classification models for one problem.
We can transform the continuous observations that can be modeled using regression algorithms, into categorical values to be modeled in classification settings (Fig. 1). This transformation is sometimes helpful as the difference between continuous values could be misleading due to errors in measurements. Transformation of the continuous values to categories could, not necessarily always, eliminate those artifactual measured values.
Figure 1. Supervised learning: how to go from regression to classification. The affinity between small molecules and their target proteins are used as an example.
In our next post, we will talk about unsupervised learning and dimensionality reduction. In this series, we plan to introduce several other fundamental topics associated with Machine Learning, such as deep learning and transfer learning!
Stay tuned !!
Written by: Ali Madani
Edited by: Stephen MacKinnon and Chinmaya Sadangi
Ali develops new deep learning models to improve drug-target interaction prediction. He completed his Ph.D. in Computational Biology at the University of Toronto, developing new feature selection approaches from omics profiles of patient tumors that are predictive of their survival and their response to drugs.