Revised title: 17 Jul 2018

What did Bateson mean by saying
"information" is "a difference that makes a difference"?

The original title was:
Bateson did not define "information" as
"a difference that makes a difference"
(And he would have been rather silly if he had.)

Aaron Sloman
http://www.cs.bham.ac.uk/~axs
School of Computer Science, The University of Birmingham, UK

This file is http://www.cs.bham.ac.uk/research/projects/cogaff/misc/information-difference.html
Also available as a PDF file (derived from HTML):
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/information-difference.pdf

Installed: 22 Jan 2011
Last updated: 24 Jan 2011; Reformatted May 2015; minor changes Apr 2016.

Major revision and change of title: 17 Jul 2018, in response to comments from Olivier Marteaux.


Background
Some of what follows is based on section 2.3 of this book chapter:
http://www.cs.bham.ac.uk/research/projects/cogaff/09.html#905
Aaron Sloman,
What's information, for an organism or intelligent machine?
     How can a machine or organism mean?,
in Information and Computation,
Eds. Gordana Dodig-Crnkovic and Mark Burgin,
World Scientific Publishers, 2011, New Jersey, pp 393--438
It is also closely related to my endorsement of Jane Austen's theory of information, contrasted with Claude Shannon's here in 2013:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/austen-info.html

These ideas are central to the Turing-inspired Meta-Morphogenesis project, later sub-titled "The self-informing universe project".
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/meta-morphogenesis.html


CONTENTS
Background (Above)
Introduction: the Myth
What Bateson Actually Wrote
A correction
What is information?
Claude Shannon vs Jane Austen
Other proposals for defining "information"
Problems with the definition
Comments, criticisms and suggestions welcome.

Introduction: the Myth

It is widely believed that the polymath Gregory Bateson defined "information" as "a difference that makes a difference". I think this is a myth, and he did no such thing.

This alleged definition is often quoted with approval by thinkers of different backgrounds, as can be seen by searching for occurrences of the phrase "a difference that makes a difference" in conjunction with "information". Sometimes the definition is attributed to others, presumably because they have quoted or used it.

Obviously the phrase "a difference that makes a difference" resonates powerfully with many people. Perhaps this is because it is a pointer to a very common and important kind of complexity, in which systems are composed of linked, tightly coupled, sub-systems whose causal relationships have the property that any event in one sub-system (e.g. some property, value or relationship changing, or a part being added or removed) has effects in other subsystems, or possibly ripples of effects spreading out through the whole system.

Examples include a pebble hitting the previously flat surface of a pond, a fly wriggling in a spider-web, the speed of rotation of a cog wheel in some machine changing because of a change in friction, pushing a button causing an electric circuit to be closed triggering a wave of activation through a collection of interacting electronic and mechanical devices, an army being galvanised into battle by a light signal flashed on a hill-top, a news item causing share-prices all round the world to begin to fall, or a rumour spreading quickly through a community and causing a mob to attack a building.

In all these cases it is reasonable to say that some information flowed through a more or less complex system, triggered by the initial change (a difference occurring) and that the intermediate stages of such propagation depend on new intermediate changes/differences producing new effects elsewhere (including positive and negative feedback loops in some systems).

It is possible for a change or difference that is not temporal but spatial to have effects. For example, a geologist surveying some terrain may notice a transition across a boundary, which suggests the possibility of some desirable material or substance being available somewhere underground on one side of the boundary. Alternatively a farmer who notices a boundary separating two kinds of soil may be caused to sow a certain crop only on one side of the boundary. In these cases, the static spatial change or difference produces temporal changes as a result of the occurrence of detection or observation of the static change: i.e. the trigger may be temporal, even though what is triggered depends on something non-temporal. In such a case, the original difference need not actually make a difference to anything: whether it does or not will depend on something else: a happening triggered by detection of the spatial difference.

Bateson could deal with that quibble by replacing "A difference that makes a difference" with "A difference that can make a difference". I'll return to this below. The potential difference-triggers in a situation may be as important as the actual triggers, like the "No trespassers" sign that has an informing function whether there are readers present or not.

What Bateson Actually Wrote

While working on the "What's information" paper referenced at the top of this file I was mystified as to how someone as intelligent as I knew Bateson to be could have written something so obviously problematic and unhelpful. Since I had a copy of a collection of his papers, the 1972, Chandler Paperback edition of Steps to an Ecology of Mind: Collected Essays in Anthropology, I began to search for the reported definition. But as far as I could find, the definition attributed to him is actually a mis-report, for the quoted definition is not what he wrote. What I found he actually had written was something much more sensible.

Bateson described not "information" but "a bit of information" and later "the elementary unit of information" as "a difference that makes a difference".

He did this in at least two of the essays, namely in "The Cybernetics of 'Self': A Theory of Alcoholism" and in "Form Substance and Difference".

Notice that there is a difference between attempting to define (or say something definitive about) the word "information" and attempting to do it for more complex phrases like "a bit of information" and "the elementary unit of information", which he seems to take as different labels for the same thing, which he describes as "a difference that makes a difference". Similar or equivalent wording, with "information" always qualified as illustrated here occurs in several places in the book.

In all the contexts that I found, he was NOT talking about, or defining, information in general but about an ITEM or UNIT or PIECE of information as a difference that makes a difference.

So it looks as if he accepted the assumption that information increments (or decrements) must be discontinuous, and that there is a minimal discontinuity -- one of the interpretations suggested above.

Given the spread of work on cybernetics and control engineering, making use of continuous changes, often expressed using differential equations, it is unlikely that he actually assumed that information must be discrete, as the phrase "a difference that makes a difference" could be taken to imply.

It seems that his remark is widely misquoted, or misrepresented, as being about "information" rather than merely being about "a bit/unit of information".

In saying this sort of thing, Bateson seems to be thinking of any item of information as essentially a collection of "differences" that are propagated along channels.

This is far too simplistic -- and perhaps too influenced by low level descriptions of computers and brains, though as indicated above, it may be a useful first approximation to a characterisation of the sort of thing that is capable of being used as a bearer of information, where the information itself could be expressed or carried by alternative structures: the information is not necessarily linked to a unique mode of expression, since different bearers for the same information content might be preferable in different contexts.

However, the phrase "a difference that makes a difference", or "a bit" is not appropriate for the information content of something complex, like this sentence, or Euclid's proof that there are infinitely many prime numbers.

A correction

In June 2018, after reading an earlier version of this document, Olivier Marteaux kindly informed me that I had missed the significance of some of what Bateson had written in "Form substance and difference", mentioned above, also available online here:
     http://faculty.washington.edu/jernel/521/Form.htm

It is worth quoting the full sentence, which refers to energy, and the following discussion:

What we mean by information - the elementary unit of information - is a difference which makes a difference, and it is able to make a difference because the neural pathways along which it travels and is continuously transformed are themselves provided with energy.
and later
But what is a difference? A difference is a very peculiar and obscure concept. It is certainly not a thing or an event. This piece of paper is different from the wood of this lectern. There are many differences between them-of color, texture, shape, etc. But if we start to ask about the localization of those differences, we get into trouble. Obviously the difference between the paper and the wood is not in the paper; it is obviously not in the wood; it is obviously not in the space between them, and it is obviously not in the time between them. (Difference which occurs across time is what we call "change.")

A difference, then, is an abstract matter.

He then goes on to point out some differences between the subject matter of "hard sciences" such as physics and the study of minds, or information-using systems:

A difference, then, is an abstract matter.

In the hard sciences, effects are, in general, caused by rather concrete conditions or events-impacts, forces, and so forth. But when you enter the world of communication, organization, etc., you leave behind that whole world in which effects are brought about by forces and impacts and energy exchange. You enter a world in which "effects"--and I am not sure one should still use the same word--are brought about by differences. That is, they are brought about by the sort of "thing" that gets onto the map from the territory. This is difference.

Difference travels from the wood and paper into my retina. It then gets picked up and worked on by this fancy piece of computing machinery in my head.

The whole energy relation is different. In the world of mind, nothing--that which is not--can be a cause. In the hard sciences, we ask for causes and we expect them to exist and be "real." But remember that zero is different from one, and because zero is different from one, zero can be a cause in the psychological world, the world of communication. The letter which you do not write can get an angry reply; and the income tax form which you do not fill in can trigger the Internal Revenue boys into energetic action, because they, too, have their breakfast, lunch, tea, and dinner and can react with energy which they derive from their metabolism. The letter which never existed is no source of energy.

A difference, then, is an abstract matter.

In the hard sciences, effects are, in general, caused by rather concrete conditions or events-impacts, forces, and so forth. But when you enter the world of communication, organization, etc., you leave behind that whole world in which effects are brought about by forces and impacts and energy exchange. You enter a world in which "effects"-and I am not sure one should still use the same word-are brought about by differences. That is, they are brought about by the sort of "thing" that gets onto the map from the territory. This is difference.

Difference travels from the wood and paper into my retina. It then gets picked up and worked on by this fancy piece of computing machinery in my head.

Notice the next sentence: Bateson seems to be trying to characterise differences between the causal roles of information and the causal roles of physical entities and their properties.

The whole energy relation is different. In the world of mind, nothing--that which is not--can be a cause. In the hard sciences, we ask for causes and we expect them to exist and be "real." But remember that zero is different from one, and because zero is different from one, zero can be a cause in the psychological world, the world of communication. The letter which you do not write can get an angry reply; and the income tax form which you do not fill in can trigger the Internal Revenue boys into energetic action, because they, too, have their breakfast, lunch, tea, and dinner and can react with energy which they derive from their metabolism. The letter which never existed is no source of energy.

It follows, of course, that we must change our whole way of thinking about mental and communicational processes. The ordinary analogies of energy theory which people borrow from the hard sciences to provide a conceptual frame upon which they try to build theories about psychology and behavior-that entire Procrustean structure-is non-sense. It is in error.

I suggest to you, now, that the word "idea," in its most elementary sense, is synonymous with "difference."

I wonder how many of the people who approvingly quote Bateson as defining "information" as "a difference that makes a difference" agree with all of the above. My own paraphrase would be:

Matter and energy can be used and can have effects and information can also be used and can have effects, but information is not something spatio-temporally located, like portions of matter, energy, force (transfer of energy). It is something more abstract, more concerned with the making of choices between alternatives.

But there is only the weakest of indications that everything he is saying about what information is and is not, and what it can do, presupposes the possibility of a user of the information.

What is information?

All the above still leaves unanswered the question "What is information?". My own answer is long and complex, as explained in the "What's information" paper cited above, which attempts to show that it is at best possible only to define "information" implicitly, by presenting a complete theory about information and its role in many systems.

This kind of implicit definition of deep and complex concepts is the only possibility for many scientific concepts, including "matter" and "energy" -- which is why "symbol-grounding" theory (another name for "concept empiricism"), is false, as explained in this presentation.

The "What's information" paper attempts to present substantial portions of such a theory, though the task is not completed. In particular section 3.2 explains how theories can implicitly define the concepts they use and relates this to defining "information".

More specifically, what it means for B to express I for U in context C cannot be given any simple definition, in part because it is a generic polymorphic concept, which can be instantiated in different ways in different contexts.

Claude Shannon vs Jane Austen

I think Shannon confused many scientists, philosophers and artists by choosing the label "information" for his technical concept related to engineering problems in transmitting, encoding, decoding, compressing, decompressing, storing and retrieving information bearers (e.g. sentences, pictures, lists, collections of numerals, bit patterns, etc.).

He knew that there was an older, deeper notion of information, but unfortunately somehow (unintentionally) persuaded many thinkers to ignore it.

The older idea was used by Jane Austen in her novels, e.g. Pride and Prejudice. The difference between Austen-information and Shannon-information is discussed here:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/austen-info.html
Jane Austen's concept of information (Not Claude Shannon's)

Other proposals for defining "information"

Some people try to specify the meaning by saying U uses B to "stand for" or "stand in for" I. For instance, in an interesting contribution Barbara Webb writes "The term 'representation' is used in many senses, but is generally understood as a process in which something is used to stand in for something else, as in the use of the symbol 'I' to stand for the author of this article"
Barbara Webb, Transformation, encoding and representation, in Current Biology, 16, 6, pp. R184--R185, 2006, doi:10.1088/1741-2560/3/3/R01
That sort of definition of "representation" is either circular, if standing in for is the same thing as referring to, or else false, if "standing in for" means "being used in place of". There are all sorts of things you can do with information that you would never do with what it refers to and vice versa. You can eat food, but not information about food. Even if you choose to eat a piece of paper on which "food" is written that is usually irrelevant to your use of the word to refer to food.

Information about X is normally used for quite different purposes from the purposes for which X is used. For example, the information can be used for drawing inferences, specifying something to be prevented, or constructed, and many more. Information about a possible disaster can be very useful and therefore desirable, unlike the disaster itself.

So the notion of standing for, or standing in for is the wrong notion to use to explain information content. It is a very bad metaphor (based on some person or object taking the place of another in some process or situation), even though its use is very common.

We can make more progress by considering ways in which information can be used. If I give you the information that wet weather is approaching, you cannot use the information to wet anything. But you can use it to decide to take an umbrella when you go out, or, if you are a farmer you may use it as a reason for accelerating harvesting. The falling rain cannot so be used: by the time the rain is available it is too late to save the crops.

The same information can be used in different ways in different contexts or at different times. The relationship between information content and information use is not a simple one.

Problems with the definition

Despite all the interesting facts alluded to by Bateson, there are several problems with his proposed definition of "information".

Comments, criticisms and suggestions welcome.

I am grateful to Olivier Marteaux who wrote to me in June 2018, pointing out that the previous version of this paper did not do justice to Bateson's actual discussion of differences and their propagation. As a result I looked again at the context of the items I had quoted from his collected papers in Steps to an Ecology of Mind, and found myself agreeing that although my criticisms of his actual words were justified, I had missed the point that he was struggling to make by distinguishing information from its physical vehicles, which could be many and varied.

Bateson's summary focused on the fact that in many contexts information is carried by some change, e.g. in a spatial structure or temporal pattern which is not always physical. Perhaps another way of expressing that would be to say that every use of information must involve a comparison, e.g. between two or more available options, or between how things are and how they might have been or were previously.

I now feel that by homing in on what seemed to him to be a crucial common factor, namely some difference that has implications for some agent or decision maker, and leaving out the required context, i.e. what an information user is, he abstracted a step too far, and as a result created a powerful, but seriously misleading meme: "Information is a difference that makes a difference" that has crawled around many brains without including the rich context of Bateson's thought.

The central importance of information users

My own work focuses mainly on varieties of information user found among products of biological evolution as well as products of human engineering, especially human engineering since computers became available.

Long before there were computing machines as we now think of them, many ingenious human designers built machines that behaved in accordance with either pre-stored information (e.g. music boxes), or constantly changing information (e.g. fan-tail windmills, the Watt governor, and many more).

Moreover, long before humans used information, or created information-using machines, biological evolution was both using information and creating ever more sophisticated varieties of information user and information. The study of those evolutionary processes is what I have been calling "The Meta-Morphogenesis project", now also referred to as "The self-informing universe project":
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/meta-morphogenesis.html

Although Jane Austen had nothing to say about biological evolution, as far as I know, she had some deep insights into the information using capabilities of some of the most recent products of biological evolution. I wonder whether Shannon ever read Pride and Prejudice, and, if not, what difference it would have made if he had.

(Is that Bateson's ghost grinning at me???)


Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham