NOT YET POSTED

Newsgroups: comp.ai.philosophy,comp.ai
References: <4bs41p$fo4@mp.cs.niu.edu> <820108318snz@longley.demon.co.uk> <4bt8gp$n9s@mp.cs.niu.edu>
Subject: Improbable theories and scientific progress. (Was: Cognitive Science ??)
Keywords: science, Popper, probability, learning, AI
Summary: Notes on philosophy of science and learning. LONG

              GENES, MEMES, PROBABILITY, CONTENT,  FITNESS
                       OF THEORIES AND GENE POOLS

An interchange between Neil Rickert and David Longley has prompted some
thoughts about science, Popper, probability, evolution, learning and AI.

In response to David Longley <David@longley.demon.co.uk>
    rickert@cs.niu.edu (Neil Rickert) wrote:

> Date: 27 Dec 1995 23:04:25 -0600
> Organization: Northern Illinois University

[NR]
> >> David quotes from Popper, "Truth, Rationality, and the Growth of
> >> Knowledge" and includes this paragraph:
>
> >> >    This   trivial  fact  has  the   following   inescapable
> >> >    consequences:  if  growth  of knowledge  means  that  we
> >> >    operate  with  theories of increasing content,  it  must
> >> >    also  mean that we operate with theories  of  decreasing
> >> >    probability   (in   the  sense  of   the   calculus   of
> >> >    probability).

Neil objected
> >> That is a remarkable claim by Popper.  And it is remarkable because
> >> it is so obviously mistaken.  The history of the growth of scientific
> >> completely contradicts this thesis.

[DL]
> >The point here *is* irrefutable....

[NR]
> 'Irrefutable' or not, it is absurd.
> ....
> Two hundred years ago weather predictions were not much better than
> "if the sunset is red it might rain tomorrow."  Today predictions are
> more refined, more precise, and have a higher probability of being
> correct.  The improvement is due to the growth of scientific
> knowledge.  Thus Popper's absurd conclusion is contradicted by the
> evidence.

Like so many apparent disagreements in these newsgroups, I think this
one arises because different people unwittingly attach different
meanings to the same words and phrases.

Analysis of the concept of "probability" is an old and difficult
philosophical problem. There are very different ways in which the term
can be used, including at least the following:

    (a) expressing a degree of strength of belief

    (b) describing a feature of a mechanism or situation (adding
        a weight to a die can increase the probability of a five
        being thrown - Popper called this a "propensity")

    (c) describing the long run limiting frequency (if one exists) with
        which events of type P will occur in circumstances of type C,
        in the world (This concept is different from (b) though the
        two are related: propensities may explain long run frequencies)

    (d) describing the long run limiting frequency with which a
        certain property is present in a randomly generated
        mathematically defined sequence (e.g. the limiting frequency of
        occurrence of two fives in pairs of random selections of numbers
        between one and six with equal average frequency)

NB: (d) refers to a mathematical property of a mathematically defined
system, and can often be calculated exactly (e.g. proving that the
probability of two fives is 1/36, given assumptions about the
equiprobability of all individual numbers) whereas (b) and (c) both
refer to a property of the actual world and can never be calculated by
purely mathematical analysis. Instead (b) and (c) require estimations
based on measurement and observation, and they are subject to error even
when no mathematical mistake has been made, unlike case (d): biased dice
don't disprove a mathematical result. If you are talking about the
probability of producing two fives with a real pair of dice you are not
referring to case (d). There are deep philosophical problems about
whether ANY method can *reliably* detect probabilities of type (b) and
(c) even though many methods are actually used and seem to work much of
the time.

Incidentally, unlike some philosophers, Popper claimed that there is a
useful concept of type (b). But that's not the concept he was using in
making the claim quoted by David, which I'll analyse shortly.

Another source of muddle is ambiguity regarding what counts as a measure
of the CONTENT of a proposition. Popper was using the idea that for Pa
to say more about something than Pb does, Pa has to be more precise,
i.e. it has to specify a narrower range of possible states of affairs in
which it would be true, than Pb does. I.e. a proposition with more
content rules out more possibilities.

Increasing content in this sense implies increasing precision.

As I'll explain below that's not the only possible measure of content,
and so we have another ambiguity producing spurious disagreements.

When making the claim that increasing content (in the sense just
decribed) reduced the probability of a theory, Popper was using a
mathematical concept of probability something like type (d).

If Pa has more content than Pb (in that sense) this mathematically
implies that there are potentially fewer instances satisfying Pa than
satisfy Pb, and so that long run frequency of situations in which Pa
could be true is smaller, and that's why Pa's having lower probability
than Pb follows "trivially" from its having more content. Think of it
this way: if you add content to Pb to get Pa, in such a way that
    Pa -> Pb
remains true, then whenever Pa is true Pb will be, thus the probability
of Pb is at least as high as that of Pa.

[NR]
> ....Popper is assuming that all knowledge growth
> amounts to taking conjunctions of components of existing knowledge.

It's true that this is ONE way of increasing the content (in Popper's
sense) of propositions, but not the ONLY way. Another way is to reduce
the number of disjuncts in a disjunctive proposition. There are other
ways which don't depend on logical relations, but geometric or
topological relations. E.g. if I tell you Fred is in the kitchen I say
something with more content (in Popper's sense) than if I tell you Fred
is in the house, and I also thereby reduce the probability of what I say
being true (if nothing else is known).

Note that these are NOT two conjunctive expressions. (You might think of
them as disjunctive, where each says something of the form
    "Fred is in location a, or in location b or in ... "
with "Fred is in the kitchen" expressing fewer disjuncts. But that's not
a good general interpretation of spatial concepts. For example, I
don't have to be able to form a disjunction specifying all the
locations in a house, in order to believe that Fred is in the house.)

I don't believe Popper was presupposing the rather silly notion that all
scientific theories have to be expressed in some particular logical form
using conjunctions (or even conjunctions and disjunctions, or even first
order pedicate calculus). Rather, he was relying on the intuitive notion
that the more information I give you about some state of affairs the
more possibilities I am ruling out. I.e. he was linking increasing
content with increasing precision.

Notice that there are perfectly good senses of `increased content' for
which it would NOT be true that having more content means ruling out
more possibilities. E.g. you can say that the proposition

    P1. All physical objects are solids, liquids, gases or plasma

has more content than the proposition

    P2. All physical objects are solids, liquids or gases.

In some sense P1 has more content (assuming a plasma is not a gas): it
implies a richer ontology. But it does not have more content in Popper's
sense because of the disjunctive form: P1 does not allow you to be as
precise about the next object you'll come across as P2 does, and in that
sense it says less. This is a case where the theory with LESS content in
Popper's sense is better, because of how the world is.

More generally, every time you add an existential statement to your
theory, of the form "It's possible for entities of type X to exist" the
more content you add in one sense, for you say more about the types of
things that the universe can contain, but the less you say in Popper's
sense, for less can now be inferred about the next object you'll come
across. It no longer has to be one of the previously specified types. So
now, in some sense the probability of the theory has increased: it's
true in more cases. However, if you consider the set of possible
universes, and for each of them describe the types of things they
contain, then universes satisfying P1 will be a subset of universes
satisfying P2, so in that context the probability of P1 is higher.

Whether the probability is higher or lower in the mathematical sense,
depends on the choice of space within which to compare frequencies.
I don't think Popper thought about that.

(I suspect Popper's link, in a subset of cases, between increased
semantic content and ruling possibilities out, i.e. reduction of
uncertainty, is what lies behind the irresponsible misuse of the word
"information" in connection with mathematical so-called "information
theory", which has nothing to do with semantics or meaning, but only
with the syntactic properties of sets of signals. There is an indirect
link with real (semantic) information in that this syntactic measure is
related to how much semantic information a signal could carry in a given
context. But that's a measure of potential information, not actual
information, in the ordinary sense.

This mathematical misuse of the ordinary word "information" has caused
*enormous* amounts of confusion -- even leading some silly composers to
link randomness in music with musical significance, using arguments like
this:
    "The less predictable music is, the more possibilities each new
    note rules out, and therefore the more meaningful it is.
    Therefore random sequences are most meaningful!!"

I can remember a time when some music students were taught that kind of
stuff.)

Anyhow, unlike Popper in the quotation, who refers to mathematical
probability of type (d), Neil, I suspect, is talking about empirical
probability of type (b) or possibly (c), when he compares two
situations, 18th century meteorology and 10th century meteorology, where
meteorologists are armed with powerful theories that have withstood a
lot of testing.

He is claiming that in the latter case the probability of a prediction
being correct is higher.

In Popper's terms I guess this could also be described by saying that
modern meteorologists have an increased propensity to get their
predictions right. And of course Neil is correct in arguing that modern
meteorologists with their improved higher-content theories not only have
more precise predictions (i.e. theories and predictions with more
Popperian content) but also a higher probability of being correct.

And of course Popper is right in claiming that IN GENERAL adding content
(in his sense) to a theory reduces the probability of its predictions
being correct (in all possible situations), simply because adding more
content (in Popper's sense) reduces the set of possible states of
affairs in which the theory is true.

I.e. Popper's claim is not mistaken or absurd. It is correct, but that's
because it is not a claim about the world but about mathematically
defined proportions, using the concept of probability of type (d).

(Whether that supports any of David Longley's arguments is a different
matter: it's perfectly possible to quote a correct theory in support of
an incorrect one.)

All this raises a problem for philosophers of science, namely to explain
what sort of scientific process can bring about a state of affairs in
which scientists produce richer and more precise theories with a HIGHER
probability of correctness in their predictions, even though increasing
the precision and content (in Popper's sense) of a theory mathematically
implies a REDUCED probability of it, or predictions based on it, being
true.

I think Popper was aware of this problem and as far as I recall his
answer was based on an analogy between the development of science (or
ideas and knowledge in general) and biological evolution.

He argued (or could have argued, if he didn't actually do so!) that the
processes of scientific testing (which would include implicit testing
through engineering applications), like biological selection processes,
weeds out theories that are relatively "unfit" in various ways.

One way in which a theory can be unfit is in not matching how the world
is when tested through predictions.

There are others, e.g. not being compatible with the current world view,
or locally dominant religion. These different fitness criteria are
frequently in opposition, and can get in the way of assessing scientific
theories.

Because he had so much faith in the scientific selection process, Popper
(unlike David Longley) imposed NO restrictions on where theories came
from or what formalisms could be used to express them, so long as they
were capable of being tested so that relatively unfit ones could be
rejected (though he also knew that such rejection might prove mistaken
and relative fitness illusory, e.g. because of poor measurements, or too
narrow a range of experiments, or whatever. More on this below.)

There is still an implicit assumption behind the analogy between theory
selection and biological selection, namely the assumption that a process
of selection, whether biological or scientific, will in general go on
producing organisms or theories that are better and better fitted to
their niche.

Whether that is so depends on how the world is.

If the world is capable of changing in arbitrary ways (so that niches
change) then what is fit can change in unpredictable ways. And that is
the case with biological fitness. E.g. a volcanic eruption can radically
alter fitness requirements in the neighbourhood of the volcano, and
rapidly replace one set of successful species with another set,
including allowing what was previously less successful to become
dominant.

Is the process of scientific theory selection liable to such disruption
through changing fitness conditions?

It is often assumed that at least some sciences (physics, chemistry,
cosmology, though NOT geology, biology, psychology, sociology) are
concerned with matters that are universal (the same everywhere) and
immutable (the same at all times), and do not depend on the contingent
circumstances of a particular planet, or galaxy, or historical epoch.

If there are any such universal sciences, then, the scientific fitness
criteria, properly applied, could in general (apart from temporary
aberrations) lead continually to better and better theories, at least in
these sciences, without any possibility of dead-end dinosaur theories
that are soon to be doomed because the ultimate nature of physical
reality is about to change.

Of course, because of limitations and aberrations in the testing
processes mentioned above, dead-end theories can occur, even on this
view.

That's partly because scientific testing is inherently error prone
because the quality of the tests are themselves limited by the current
state of scientific knowledge. Testing can give erroneous or misleading
results because of several factors, e.g.
    - tests may fail to take account of `hidden' features of the
        testing situation
    - tests may be limited to too narrow a range of circumstances
        (like many psychological experiments)
    - tests may use inadequate measuring devices - e.g. devices that
        have been carelessly made, or worn out, or damaged in some way,
    - tests may be based on wrong theories that are assumed not to be
        under test. (E.g. tests that assume that only the topology of
        electronic circuits in measuring instruments and not the
        physical layout of the components can make a difference to their
        performance may give misleading measurements because of the poor
        layout of circuits in those instruments, leading to the wrong
        rejection of a theory, or wrong non-rejection. Or a test of
        theory A assuming a theory B linking size of pulsars to their
        pulse frequency might wrongly support or reject theory A because
        of errors in theory B.)

By contrast, evolutionary selection is not done by an intelligent agent
that might be misinformed. If a bunch of collaborating genes fail to get
themselves reproduced in a particular context then they have just
failed. Excuses are irrelevant.

[Sexual selection pressure is a counter-example, leading to things like
peacocks. Here part of the gene pool driving the peahen information
processing is part of the niche for other parts of the gene pool shared
with peacocks.]

Scientific selection is not the only form of cultural selection, as
shown by the spread of religions, waves of fashion, tastes in food, etc.
How scientific selection differs from the rest, and why it is more
concerned with truth, is a long and complex story, and still a matter of
controversy.

Now let's return to Neil's point about weather forecasting: theories
now have more content than they used to have, and predictions based on
them have higher probability of being correct than they used to have.
Is this a paradox?

Is meteorology like physics? Is physics in fact investigating
universal and immutable aspects of reality? Or is it more like a species
selected in balmy days, and liable to be dislodged following a volcanic
eruption.

Neil, I think, assumed (at least when he wrote the above) that
meteorology is concerned with an unchanging type of reality.

Even if physics is the study of an unchanging reality (which some
cosmologists may dispute) I think it is not true of meteorology, e.g.
because derivation of principles of meteorology from those of physics is
computationally intractable, or at least beyond the wit of current
scientists.

In that case, the meteorological principles used at present
will have been selected not by derivation from physics but only by
direct testing of rival theories in the earth's weather system.

[Actually, I am making a general point about how some sciences might
work which could be correct even if I am wrong about current
meteorology: e.g. if all its forecasts are based directly on principles
of quantum mechanics.]

If theories are selected not by derivation from physics but by testing
in the earth's atmosphere, this will make them much more fragile than
current physical theory, which has survived a far wider variety of
tests.

If changes in physical circumstances, e.g. changes in the chemical
composition of the atmosphere (perhaps caused by a massive volcano, or
industrial by-products), or changes in the radiation emitted by the sun,
were to change the dynamics of the earth's atmosphere, it might turn out
that 18th Century meteorological theories again gave a higher proportion
of correct predictions than those used now. In that case it would be
best to revert to depending on the older, less precise, theories. They
would have a higher probability of giving correct results, partly
because they have less content in Popper's sense.

Of course, after a while the meteorological theories might again be
revised, as a result of much selective testing, until newer, more
precise, less intrinsically probable, theories were developed which
again had both a higher content and a higher probability of successful
predictions in the new context than the earlier theories.

There's no guarantee of this, however. The weather system involves much
nonlinear feedback and is inherently chaotic. The post-traumatic weather
system might or might not have some high level patterns discernable by
human minds. Such patterns could provide a basis for a new system of
precise predictions based on the accuracy with which aspects of the
system could be measured. But there might be none (too many "butterfly
effects"), in which case we'd be stuck with the older, less precise,
systems of prediction. Thus insofar as modern meteorology is both more
precise and more often correct than two hundred years ago, that's a
matter of luck.

In physics it can also be necessary to abandon current theories, but not
because the world has changed, rather because our creative abilities and
selection processes have led us down a bad trail. Again, when that
happens it's not provable apriori that that we'll find another better
set of patterns in nature.

For the sciences, as for biological gene pools, it may be essential not
to reject completely all the old `refuted' ideas, but to keep them
around as a basis for generating future theories that may supersede the
short term winners.

One reason for that is that some theories, like biological species, are
inherently fit only for local conditions that might change, like
theories of the weather. Another reason is that imperfections in the
scientific selection process can lead us temporarily into blind alleys
from which we need to escape. E.g. particle theories of light
(propounded by Newton, for example) were refuted by Young's diffraction
experiments supporting only the fitness of wave theories. But later on
the idea of a photon, or light `particle', again became important, as a
wider range of experiments produced selection pressure favouring deeper,
more general, conceptually richer, theories.

(NB: the theories were never DERIVED from the observations,
contrary to what some people think.)

I hope I've clarified the extent to which both Popper and Neil are
correct, and the difference in status of their claims. Popper's claim
about the inverse relation between content (in his sense) and
probability (of type (d)) is mathematical and can never be refuted by
observation. Neil's claim is an empirical claim about the relative
fitness of meteorological theories, and might turn out wrong if the
world changed in certain ways. However, it seems to be true, at present.

There's a moral here for AI learning theories.

I have no idea how many of them fully address the issue of recovering
from mistaken commitments.

It may be that individual human learning mechanisms cannot deal fully
with the problem because commitments have to be made on account of how
the brain works, or because keeping options open (e.g. using assumption
based truth maintenance systems) is computationally intractable.

Of course there are specific cases of things learnt by humans
apparently being retracted: e.g. children learn how to express the past
tense of words like "run" ("ran"), "hit" ("hit"), "catch" ("caught"),
and then they learn a rule which contradicts what they have previously
learnt, so they let the rule over-ride their previous information, and
then they re-learn examples which over-ride the rule in selective cases,
perhaps because they have by then also developed a new mechanism for
allowing generalisations to have exceptions.

I suspect human learning does not use a general back-tracking mechanism
but something that simply allows new specific information to selectively
supersede old, including cases where the `new' information is
information that has been encountered previously, and was the basis
of some earlier learning, which was then superseded.

Even if individual human ability to backtrack to previously stored
choice points during learning is limited, it may nevertheless be
possible for social learning (e.g. scientific development) to address
the problem of escaping from local maxima in scientific hill climbing
because communities can include a mixture of coexisting individuals with
different commitments. (Like a mixed gene pool). Some of the individuals
may still hold views that were previously rejected by the community at
large, and this could support backtracking of a sort, though not
necessarily the systematic and rigid sort of backtracking found in a
Prolog system.

If individual learning can use `genetic' mechanisms, such as classifier
systems, then perhaps individuals can in principle have as much
flexibility as social systems or species.

But I don't know whether such highly flexible learning mechanisms can
operate on fast enough time scales for use by individuals. (NB such
genetic mechanisms differ from truth-maintenance mechanisms in that when
current beliefs are refuted they don't immediately attempt to compute
the best alternative on the basis of stored records of previous
decisions. Instead the genetic mechanisms allow new replacements to
evolve on the basis of continued interactions with the environment. This
may or may not lead back to an earlier theory. Similar comments would
apply to a neural net that does not separate a training phase and an
application phase, but constantly adjusts itself during use on the basis
of feedback.)

I've not kept up with all the work on learning in AI, and I wonder to
what extent the issue of "run-time" mechanisms for escaping from
entrenched errors has been addressed. I doubt that it has been addressed
by experimental psychologists, for their conception of learning tends to
be very restricted (e.g. something they can measure in repeatable
experiments in a laboratory, unlike most actual human learning, which is
highly individualistic and dependent on prior knowledge, like scientific
progress, and often not expressible in numeric terms, like the growth
of a set of concepts or a grammar).

[NR]
> If an apparently correct logical argument leads to an absurd
> conclusion, you should treat the argument as a reductio ad absurdum,
> and then look for what is wrong with the premises.

Alternatively, you can look, as I have done above, for a semantic
ambiguity that causes people to argue endlessly at cross-purposes. It's
a very frequent occurrence.

[NR]
> ....Popper is assuming that all knowledge growth
> amounts to taking conjunctions of components of existing knowledge.

I don't think so. That may be David's interpretation, because of his
commitment to a restricted language for science. I don't think Popper
has any such commitment.

In fact he suggested that there was a kind of objective social knowledge
that was expressed in many aspects of the humanly constructed
environment, including libraries, museums, etc. This was his `third
world'. The `first world' was the world of physical objects events and
processes. The `second world' was the world of subjective private mental
events and processes. He did not say that only those bits of the third
world using predicate calculus were knowledge.

[NR]
> That is an extreme nativist position, for it implies that we already
> know everything and only need to logically derive any missing
> details.  Only someone in the grip of an ideological committment to
> formalism could make such an absurd assumption.

You should not judge Popper by the contexts in which he is cited. As far
as I know he never did make restrictive claims about the forms in which
knowledge could be expressed.

By the way, just for the record, though I learnt much from Popper I
don't agree with everything he wrote.

In particular, in Chapter 2 of `The Computer Revolution in Philosophy' I
strongly criticised him e.g. for rejecting discussion of meanings or
concepts as irrelevant to the study of science, and for his
falsificationist criterion of scientific content. What Popper should
have said was that a criterion of merit in scientific theories is having
objectively decidable consequences. Being falsifiable is a special case
of this, and too restrictive.

I also criticised Popper, and most other philosophers of science, for
emphasising discovery and testing of laws in science, thereby ignoring
deeper forms of scientific creativity.  The latter includes extending
our grasp of what is possible or conceivable, i.e. extending our grasp
of the `form' of reality, which defines the space of possible laws worth
investigating. By comparison with that, finding a particular law is much
easier and far more commonplace.

This is related to the over-simple link between increased content and
increased precision, criticised above. Sometimes vision-expanding
theories have less precision (in the technical sense) than the older
theories they replace. But they are still semantically richer for they
refer to an enlarged ontology. The really great theories combine this
with increased precision in particular applications. This Popperian
requirement is what constrains the potential explosion of ontology
expanding theories.

But that's another long story.

Aaron
PS
[Margaret Boden's book on creativity also stresses the growth of
knowledge of what is possible, as opposed to laws.]

---


e

