When Is A.I. Trustworthy? When Is A.I. Useful?

This post was adapted from its original source on OneZero and can be viewed here.

When we talk about the weather, we often don’t stop to consider that we’re leaving out a lot of information. If I asked someone how hot it was outside, and they started listing positions and velocities for various air particles, I would walk away in alarm and confusion (or try to learn how they obtained such knowledge). The reality is that we, as humans, have a fairly innate grasp of the distinction between informative and useful. Telling someone it’s “real hot” outside rather than saying it’s 38.94 degrees Celsius is less informative, but also less cumbersome. This act of discarding and summarizing information is the very essence of prediction, and we can define, measure (approximate), and take advantage of this process to improve predictive models and A.I. (and always be correct when you predict the weather).

Boltzmann entropy

“Nothing is more practical than a good theory.”

—Ludwig Boltzmann

If you’re familiar with the concept of entropy, chances are you probably learned one or two systematic definitions of it (for example, thermodynamic entropy, information entropy). Otherwise, you might have been told it’s a measure of “randomness.” (Defining randomness is another topic.) I’ve come to the view that there are multiple definitions of entropy, all of which are more or less detailed and more or less useful in different contexts (though I’m generally opposed to calling it a measure of “randomness”). One of my particular favorites was first described by total genius Ludwig Boltzmann circa 1875 and is now commonly referred to as Boltzmann entropy or Boltzmann’s entropy formula.

In simplified terms, Boltzmann said entropy was directly linked to the relationship between microstates and macrostates. For any given macrostate description of a system, the entropy would be higher if more microstates were compatible with that macrostate. It helps to think of it in one of the contexts that it was originally described: particles of gas in a container.

 
A summary increases the entropy and creates directionality.

On the left, the microstate of this system is defined as the velocity, mass, and position of all the gas particles in the container. (It’s a highly specific description.) In contrast, on the right, summarizing the microstate as a temperature creates one possible macrostate (a less specific, summarized description). This macrostate has some very interesting properties: It’s irreversible (you can’t go from just the temperature to a full description of all the particles ). It’s less complex (less information, less effort to say it). It’s still accurate (that container really is 38.94 degrees Celsius). But most important, it’s less specific (more than one microstate could fit it).

A given macrostate will have multiple microstates that are compatible with it.

This is the key to Boltzmann entropy: Each macrostate will have many microstates that are all compatible with it.

The way temperature is defined means that any of the three containers shown on the left (above) will produce the same temperature. The more such microstates there are, the greater the entropy of the statement “this container is 38.94 degrees Celsius.”

As commonplace as telling the temperature is, it’s not the only place where Boltzmann entropy is relevant. Any conversation is rich with high-entropy statements, carefully chosen to be descriptive but not too specific. For example, I could describe the thumbnail for this article as a “labeled for reuse clip-art drawing of the Mona Lisa,” which is what it is. However, if you hadn’t already seen that specific image, any of these microstates might fit that macrostate equally well:

All of these are a “labeled for reuse clip-art drawing of the ‘Mona Lisa.’”

After this, it’s pretty clear that entropy can show up in unexpected places, but so far we haven’t done anything to tie this to A.I. or predictions. To do that, we have to talk about maps.

The problem with maps

“The best material model of a cat is another, or preferably the same, cat.”

Norbert Wiener, ‘Philosophy of Science’ (1945)

Let’s do a small thought experiment. Imagine I ask you for directions to that weird new Garfield-themed pizza restaurant in Toronto (sorry). Neither of us have our phones, but I have some paper and a pen, and you know the way for sure. You probably don’t hesitate and draw something that looks like this (but less bad):

 

Unfortunately for both of us, I’m easily confused. I have to ask for further clarification: “Which way is north?”

You sigh in exasperation and add:

 
That’s enough of these drawings, I think.

You can probably see where this is going, but I keep asking for more and more clarification. By now, your map looks more like this:

 
Yes, it’s a real restaurant.

You’ve added a lot more detail, and we would recognize most of it as pointless with regard to its intended use: guiding me to that tasty, horrifying lasagna pizza. In fact, if we were to keep going, we would eventually be forced to realize that the most accurate, most descriptive, most complete map would just be the area itself, to scale, complete with a bustling Monday-hating restaurant where no one opens their mouth to talk. To navigate the map, you would need a map of the map with less detail on it. This map has the same amount of entropy as the place itself and is no help to me. (But what craftsmanship. Well done, you.)

The utility of a model lies somewhere between a complete description and an abstract sketch. If you don’t believe me, just take a look at the difference between the Toronto Transit Commission’s map to scale and as it’s shown on the subway:

 
http://www.mapto.ca/maps/2017/5/9/the-newest-ttc-map-is-distorted
Image from MapTO’s amazing analysis.

If the map were too distorted, perhaps by overlapping some lines a few extra times (changing the topology), it wouldn’t be useful as a tool anymore. Distort it in just the right way, however, discarding information about scale and distance, and the map becomes even more useful for quickly learning how many stops are left before you get off the train. When modelling a system, it should be as minimally detailed as required to be maximally useful for its intended purpose. 

(I highly recommend following @mapTOdotca on Twitter if you like maps at all.)

“How ‘bout that weather, eh?”

“Prediction is very difficult, especially if it’s about the future.”

—Niels Bohr

What is the most accurate prediction of tomorrow’s weather that you could possibly make? Which prediction is most likely to be true when that fateful day (tomorrow) comes?

  1. It will be sunny, high of 27 degrees Celsius, low of 18 degrees Celsius, with rising tides and warming oceans.
  2. There will be two millimeters of precipitation between 2 p.m. and 4 p.m.
  3. It will be hotter than yesterday.
  4. All of the above.

Of course, the answer will depend on what knowledge you already have about the weather and how it might play out over time. But barring any ability to predict the weather, your best bet would probably be option three. It will be hotter than yesterday (you guess). This prediction has very high entropy compared to the others. There are many weather microstates that would be compatible with being hotter than yesterday.

Always make predictions at a higher level of entropy than the data used to generate the predictions.

If you’re trying to predict the weather, you don’t always need a low-entropy prediction. If you’re trying to decide whether to bring your umbrella to work, you just need to know if it will rain today. If you’re farming, you might need less entropy: How much rain will there be in total? (Farmers, please help educate me if I’m wrong about that.)

In all these different use cases for weather predictions, however, one thing remains constant: The prediction always has more entropy than the highly detailed data that was used to generate the prediction and/or models.

The summary fallacy

“Chaos is a ladder.”

—Littlefinger

 

Whenever I set out to model something, I first spend some time trying to place all the elements of my problem onto a ladder (as weird as that sounds). I do this so I can avoid falling into the summary fallacy: the belief that you can usefully predict something at the same level of entropy as the data you learn from. It’s not a hard rule, but I find it useful: Always predict up the ladder. Always make predictions at a higher level of entropy than the data used to generate the predictions.

In my own day-to-day, right now, this most often relates to making predictions about the interactions between small-molecule drugs and proteins. If I say that the detailed structural data from high-resolution crystal structures is my input data at level one, then I might use that to build models capable of predicting something much higher up the ladder. This information is highly detailed; it describes the positions of all the atoms of the protein and the drug as they interact.

 
Ah, such beautiful low-entropy protein-drug spaghetti. Drug showed in pink.

My prediction is going to be much less detailed. For example: Does small-molecule drug X interact with protein X? This is akin to using detailed information about particles to learn how temperature relates to a container being “hot” or “not hot.” This way, in the future, if I give my model a temperature, it will predict “hot.” In this case, my model is just predicting “yes, this drug interacts somehow with this protein.” This binary prediction is still useful to me if I’m trying to design a better drug, but it’s not detailed enough to recreate the specific atom-level details of the predicted interaction. If my model did that, I wouldn’t trust it. Generally, you should only climb up the ladder, not back down again to make a prediction. To believe that you can reliably go from a summary to a specific microstate: that’s the summary fallacy.

Final thoughts

“All models are wrong, but some are useful.”

—George Box

If you, like me, spend a lot of time thinking about your own models or trying to use and understand other people’s models, then I hope you find this as useful as I have. I think it’s important to remember, even with all the hype in the A.I. and machine learning space, that A.I. still has limits. Understanding and respecting these limits won’t hold you back; instead, it will let you focus on what’s really important: What is useful to you?

Also, it will keep your weather forecasting lightweight.

Further reading

  • The User Illusion by Tor Nørretranders, in which an amazing concept called “exformation” (explicitly discarded information) is defined.

Related Posts

What's the fuss behind polypharmacology & multi-targeted drug design?

Originally posted on LinkedIn: https://bit.ly/3bpyKpG

It is well understood that while the common...

CONTINUE READING

Artificial Intelligence in Drug Discovery

AI in Drug Discovery

Following closely behind the widespread adoption of artificial intelligence...

CONTINUE READING

AlphaFold 2: What No One Is Talking About

In this next installment of our AlphaFold Series, we look at the potential drawbacks and...

CONTINUE READING