JUMP TO TABLE OF CONTENTS (27 Oct 2015)
Afterthoughts (2015): http://www.cs.bham.ac.uk/research/projects/cogaff/crp/crp-afterthoughts.html
PDF version should have contents in side-panel, e.g. if viewed in XPDF or Acrobat Reader.
PDF page numbers may change after further editing -- so don't use them for reference!
I was not consulted about the cover. The book is mainly concerned
with the biological, psychological and philosophical significance of
virtual machinery. I did not know that the publishers had decided
to associate it with paper tape devices until it was published.
The online version of this work is licensed under a Creative Commons Attribution 4.0 International License. If you use, or comment on, any of this please include a URL if possible, so that readers can see the original (or the latest version).
Other titles in this series:
ARTIFICIAL INTELLIGENCE AND NATURAL MAN:
Margaret A. Boden
INFERENTIAL SEMANTICS: Frederick Parker-Rhodes
Other titles in preparation:
THE FORMAL MECHANICS OF MIND:
Stephen N. Thomas
THE COGNITIVE PARADIGM: Marc De Mey
ANALOGICAL THINKING: MYTHS AND MECHANISMS: Robin Anderson
EDUCATION AND ARTIFICIAL INTELLIGENCE: Tim O'Shea
GÖDEL ESCHER BACH: Douglas Hofstadter
BRAINSTORMS: Daniel Dennett
The book was first published in Great Britain in 1978 by
THE HARVESTER PRESS LIMITED
Publisher: John Spiers
2 Stanford Terrace, Hassocks, Sussex
(Also published in the USA by Humanities Press, 1978)
Copyright: Aaron Sloman, 1978
REVISED ONLINE EDITION -- (2015)
THE COMPUTER REVOLUTION IN PHILOSOPHY:
Philosophy, science and models of mind.
Out of print 1978 book. revised, and now
accessible free of charge (html/pdf) here:
...... Continued below......
The scanned copy unfortunately had many pencilled comments and corrections so
the files were very messy, but were eventually made readable. It
proved necessary to redo all the figures, using the TGIF package, freely
available for Linux and Unix, from:
Chris Glur reported portions of the text that needed correction after scanning and conversion to html. It is likely that errors still remain. Please report any to A.Sloman@cs.bham.ac.uk.
For a while, Ceinwen Cushway kindly made printed copies of the new version available at a charge to cover printing and postage. The online PDF version now makes this unnecessary. The online version of CRP became available in September 2001, as separate html chapters. Notes and corrections were added at various times, and the format modified. Some of the changes are indicated in the text. The online edition includes recently added notes and comments, e.g. the notes at the end of Chapter 9 (on vision).
PDF versions were first produced by reading html files into OpenOffice, editing, then exporting to PDF. Since about 2012 the PDF files have been created directly from html, using the superb html2ps package and ps2pdf (on linux). An older PDF version (now out of date) is also available from the EPRINTS archive of ASSC (The Association for the Scientific Study of Consciousness). See CRP at ASSC eprints web site
In 2003, Michael Malien converted the html files (now out of date) to
format, still available in this zip file:
Nils Valentin informed me that a tool for extracting html files from a chm file is obtainable here
A Russian Student, Sergei Kaunov, created a Kindle e-book version,
(Also out of date now.)
In December 2014 I installed a copy of the 1981 Review of the book by Steven Stich, and wrote a reply to the criticisms he (and others) had made of the claim in Chapter 2 that explanations of possibilities are a core part of science even if they are not falsifiable. More information about that review, and my response to the criticisms can be found in a separate document, along with a link to Douglas Hofstadter's review, which also criticised that chapter.
HTML and PDF 'book' Versions (Some indentation lost in PDF version)
In July 2015 the online parts were combined to form this electronic book (with internal links) in HTML and PDF:
Separate chapters found online are now out of date.
In particular, Chapter 2 analyses the variety of scientific advances ranging from shallow discoveries of new laws and correlations to deep science which extends our ontology, i.e. our understanding of what is possible, rather than just our understanding of what happens when.
Insofar as AI explores designs for possible mental mechanisms, possible mental architectures, and possible minds using those mechanisms and architectures, it is primarily a contribution to deep science, the science of what is possible, in contrast with most empirical psychology which is shallow science, exploring correlations: the science of laws and regularities. This contrast is taken up further in the Afterthoughts document mentioned above, and in a separate paper on explaining possibilities, developing and supporting the ideas in Chapter 2: http://www.cs.bham.ac.uk/research/projects/cogaff/misc/explaining-possibilities.html
This "design stance" approach to the study of mind was very different from the "intentional stance" being developed by Daniel Dennett at the same time, expounded in his 1978 book Brainstorms, and later partly re-invented by Alan Newell as the study of "The knowledge Level" (see his 1990 book Unified Theories of Cognition). Both Dennett and Newell based their methodologies on a presumption of rationality, whereas the design-stance considers functionality, which is possible without rationality, as insects and microbes demonstrate well, Functional mechanisms may provide limited rationality, as Herb Simon noted in his 1969 book The Sciences of the Artificial.
Ideas about "Representational Redescription" presented in Annette Karmiloff-Smith's 1992 book Beyond Modularity summarised in her BBS 2004 article with pre-print here are illustrated by my discussion of some of what goes on when a child learns about numbers in Chapter 8. That chapter suggests mechanisms and processes involved in learning about numbers that could be important for developmental psychology, philosophy and AI, but have never been properly developed.
Some chapters have short notes commenting on developments since the time the book was published. I may add more such notes from time to time.
The most recent major venture, since about 2011 is the Turing-inspired Meta-Morphogenesis project:
In the Preface and in Chapter 1 comments were made about how the invention of computing was analogous to the combination of the invention of writing and of the printing press, and predictions were made about the power of computing to transform our educational system to stretch minds.
Alas the predictions have not yet come true: instead computers are used in schools for lots of shallow activities. Instead of teaching cooking, as used to happen in 'domestic science' courses we teaching them 'information cooking' using word processors, browsers, an the like. We don't teach them to design, debug, test, analyse, explain new machines and tools, merely to use existing ones as black boxes. That's like teaching cooking instead of teaching chemistry.
In 2004 a paper on that topic, accepted for a UK conference on grand
challenges in computing education referred back to the predictions in
the book and how the opportunities still remain. The paper, entitled
'Education Grand Challenge:
A New Kind of Liberal Education ---
Making People Want a Computing Education For Its Own Sake'
available in HTML and PDF formats here
Additional comments were made in 2006 in this document
Why Computing Education has Failed and How to Fix it
In 2007 I attempted, unsuccessfully to generate interest in a multidisciplinary
school syllabus combining computing, biology, cognitive science and philosophy.
Computing at School (Started in 2009)
Since 2009 the computing at school organisation has grown and become very influential, helping to produce massive changes in computing education in the UK, from primary school (age about 6) upwards, with far more emphasis on designing, implementing and testing programs, and far less on merely using computer-based tools. A Computing at school conference has been organised in the School of Computer Science at the University of Birmingham every Summer since 2009. Although this is bringing about massive broadening and deepening of the computing education in UK schools, it tends to be mainly focusing on making computers do things rather than trying to understand natural forms of information processing by modelling them, including human competences such as vision, use of language, learning, generating stories, etc. Perhaps that will change as more school teachers become available with the required confidence, insight and breadth of experience.
Preface and Acknowledgements ----------------------------------------------------------------------- x
1. INTRODUCTION AND OVERVIEW -------------------------------------------------------------------- 1
1.1. Computers as toys to stretch our minds ------------------------------------------------------- 1
1.2. The revolution in philosophy ---------------------------------------------------------------------- 3
1.3. Themes from the computer revolution --------------------------------------------------------- 6
1.4. What is Artificial Intelligence? -------------------------------------------------------------------- 17
1.5. Conclusion -------------------------------------------------------------------------------------------- 20
2. WHAT ARE THE AIMS OF SCIENCE? ---------------------------------------------------------- 22
2.1. Part one: overview -------------------------------------------------------------------------------- 22
2.1.1.Introduction -------------------------------------------------------------------------------------- 22
2.1.2.First crude subdivision of aims of science ------------------------------------------------- 23
2.1.3.A further subdivision of the factual aims: form and content ------------------------- 24
2.2. Part two: interpreting the world --------------------------------------------------------------- 26
2.2.1. The interpretative aims of science sub divided ----------------------------------------- 26
2.2.2. More on the interpretative and historical aims of science ----------------------------- 29
2.2.3. Interpreting the world and changing it ---------------------------------------------------- 30
2.3. Part three: elucidation of subgoal (a) -------------------------------------------------------- 32
2.3.1. More on interpretative aims of science -------------------------------------------------- 32
2.3.2. The role of concepts and symbolisms ----------------------------------------------------- 33
2.3.3. Non-numerical concepts and symbolisms ------------------------------------------------ 34
2.3.4. Unverbalised concepts ------------------------------------------------------------------------- 35
2.3.5. The power of explicit symbolisation -------------------------------------------------------- 36
2.3.6. Two phases in knowledge acquisition: understanding and knowing -------------- 36
2.3.7. Examples of conceptual change ----------------------------------------------------------- 37
2.3.8. Criticising conceptual systems -------------------------------------------------------------- 39
2.4. Part four: elucidating subgoal (b) ------------------------------------------------------------ 41
2.4.1. Conceivable or representable vs. really possible --------------------------------------- 41
2.4.2. Conceivability as consistent representability ------------------------------------------- 41
2.4.3. Proving real possibility or impossibility --------------------------------------------------- 43
2.4.4. Further analysis of 'possible' is required ------------------------------------------------- 44
2.5. Part five: elucidating subgoal (c) ------------------------------------------------------------ 45
2.5.1. Explanations of possibilities --------------------------------------------------------------- 45
2.5.2. Examples of theories purporting to explain possibilities ---------------------------- 46
2.5.3. Some unexplained possibilities ----------------------------------------------------------- 48
2.5.4. Formal requirements for explanations of possibilities ------------------------------- 49
2.5.5. Criteria for comparing explanations of possibilities ---------------------------------- 51
2.5.6. Rational criticism of explanations of possibilities --------------------------------------- 53
2.5.7. Prediction and control ----------------------------------------------------------------------- 55
2.5.8. Unfalsifiable scientific theories ------------------------------------------------------------- 57
2.5.9. Empirical support for explanations of possibilities ------------------------------------- 58
2.6. Part six: concluding remarks ------------------------------------------------------------------ 60
2.6.1. Can this view of science be proved correct? -------------------------------------------- 60
3 SCIENCE AND PHILOSOPHY -------------------------------------------------------------------- 63
3.1. Introduction ---------------------------------------------------------------------------------------- 63
3.2. The aims of philosophy and science overlap ----------------------------------------------- 64
3.3. Philosophical problems of the form 'how is X possible?' -------------------------------- 65
3.4. Similarities and differences between science and philosophy --------------------------- 69
3.5. Transcendental deductions ------------------------------------------------------------------- 71
3.6. How methods of philosophy can merge into those of science ------------------------ 73
3.7. Testing theories --------------------------------------------------------------------------------- 75
3.8. The regress of explanations ----------------------------------------------------------------- 76
3.9. The role of formalisation --------------------------------------------------------------------- 77
3.10. Conceptual developments in philosophy ------------------------------------------------- 77
3.11. The limits of possibilities --------------------------------------------------------------------- 78
3.12. Philosophy and technology ----------------------------------------------------------------- 80
3.13. Laws in philosophy and the human sciences -------------------------------------------- 81
3.14. The contribution of artificial intelligence --------------------------------------------------- 82
3.15. Conclusion ------------------------------------------------------------------------------------- 82
4. WHAT IS CONCEPTUAL ANALYSIS? ---------------------------------------------------------- 84
4.1. Introduction ------------------------------------------------------------------------------------ 84
4.2. Strategies in conceptual analysis ----------------------------------------------------------- 86
4.3. The importance of conceptual analysis ---------------------------------------------------- 99
5. ARE COMPUTERS REALLY RELEVANT? ------------------------------------------------------ 103
5.1. What is a computer? -------------------------------------------------------------------------- 103
5.2. A misunderstanding about the use of computers --------------------------------------- 105
5.3. Connections with materialist or physicalist theories of mind -------------------------- 106
5.4. On doing things the same way -------------------------------------------------------------- 108
6. SKETCH OF AN INTELLIGENT MECHANISM ------------------------------------------------ 112
6.1. Introduction ------------------------------------------------------------------------------------- 112
6.2. The need for flexibility and creativity ------------------------------------------------------- 113
6.3. The role of conceptual analysis -------------------------------------------------------------- 113
6.4. Components of an intelligent system ------------------------------------------------------ 114
6.5. Computational mechanisms need not be hierarchic ------------------------------------ 115
6.6. The structures ---------------------------------------------------------------------------------- 117
6.6.(a) the environment ----------------------------------------------------------------------------- 117
6.6.(b) a store of factual information (beliefs and knowledge) ----------------------------- 118
6.6.(c) a motivational store ------------------------------------------------------------------------- 119
6.6.(d) a store of resources for action ---------------------------------------------------------- 120
6.6.(e) a resources catalogue --------------------------------------------------------------------- 121
6.6.(f) a purpose-process (action-motive) index ---------------------------------------------- 122
6.6.(g) temporary structures for current processes ---------------------------------------- 124
6.6.(h) a central administrator program -------------------------------------------------------- 124
6.6.(i) perception and monitoring programs -------------------------------------------------- 127
6.6.(j) retrospective analysis programs -------------------------------------------------------- 132
6.7. Is such a system feasible? ------------------------------------------------------------------ 134
6.8. The role of parallelism ------------------------------------------------------------------------ 135
6.9. Representing human possibilities ---------------------------------------------------------- 135
6.10. A picture of the system -------------------------------------------------------------------- 136
6.11. Executive and deliberative sub-processes --------------------------------------------- 137
6.12. Psychopathology ---------------------------------------------------------------------------- 140
7. INTUITION AND ANALOGICAL REASONING ------------------------------------------------- 144
7.1. The problem ------------------------------------------------------------------------------------- 144
7.2. Fregean (applicative) vs analogical representations ------------------------------------ 145
7.3. Examples of analogical representations and reasoning ------------------------------- 147
7.4. Reasoning about possibilities ----------------------------------------------------------------- 154
7.5. Reasoning about arithmetic and non-geometrical relations --------------------------- 155
7.6. Analogical representations in computer vision ------------------------------------------- 156
7.7. In the mind or on paper? --------------------------------------------------------------------- 157
7.8. What is a valid inference? -------------------------------------------------------------------- 158
7.9. Generalising the concept of validity -------------------------------------------------------- 159
7.10. What are analogical representations? ---------------------------------------------------- 162
7.11. Are natural languages Fregean (applicative)? ------------------------------------------- 167
7.12. Comparing Fregean and analogical representations ----------------------------------- 168
7.13. Conclusion -------------------------------------------------------------------------------------- 174
8. ON LEARNING ABOUT NUMBERS: PROBLEMS AND SPECULATIONS ------------ 177
8.1. Introduction ------------------------------------------------------------------------------------ 177
8.2. Philosophical slogans about numbers ---------------------------------------------------- 179
8.3. Some assumptions about memory ------------------------------------------------------- 181
8.4. Some facts to be explained ---------------------------------------------------------------- 183
Note added 5 Feb 2016 -- provisionally located here
8.5. Knowing number words -------------------------------------------------------------------- 184
8.6. Problems of very large stores ------------------------------------------------------------ 186
8.7. Knowledge of how to say number words --------------------------------------------- 187
8.8. Storing associations ------------------------------------------------------------------------ 188
8.9. Controlling searches ----------------------------------------------------------------------- 190
8.10. Dealing with order relations ------------------------------------------------------------- 191
8.11. Control-structures for counting games ---------------------------------------------- 196
8.12. Problems of co-ordination --------------------------------------------------------------- 197
8.13. Interleaving two sequences -------------------------------------------------------------- 200
8.14. Programs as examinable structures ---------------------------------------------------- 201
8.15. Learning to treat numbers as objects with relationships ---------------------------202
8.16. Two major kinds of learning -------------------------------------------------------------- 203
8.17. Making a reverse chain explicit ---------------------------------------------------------- 205
8.18. Some properties of structures containing pointers -------------------------------- 210
8.19. Conclusion ---------------------------------------------------------------------------------- 212
Later notes on chapter 8
9. PERCEPTION AS A COMPUTATIONAL PROCESS ------------------------------------------ 217
9.1. Introduction ------------------------------------------------------------------------------------ 217
9.2. Some computational problems of perception ------------------------------------------ 218
9.3. The importance of prior knowledge in perception ------------------------------------- 219
9.4. Interpretations -------------------------------------------------------------------------------- 223
9.5. Can physiology explain perception? ------------------------------------------------------ 224
9.6. Can a computer do what we do? --------------------------------------------------------- 226
9.7. The POPEYE program ----------------------------------------------------------------------- 228
9.8. The program's knowledge ----------------------------------------------------------------- 230
9.9. Learning --------------------------------------------------------------------------------------- 233
9.10. Style and other global features ---------------------------------------------------------- 234
9.11. Perception involves multiple co-operating processes ------------------------------- 235
9.12. The relevance to human perception ---------------------------------------------------- 237
9.13. Limitations of such models --------------------------------------------------------------- 239
10. CONCLUSION: AI AND PHILOSOPHICAL PROBLEMS ---------------------------------- 242
10.1. Introduction ---------------------------------------------------------------------------------- 242
10.2. Problems about the nature of experience and consciousness ------------------ 242
10.3. Problems about the relationships between experience and behaviour -------- 252
10.4. Problems about the nature of science and scientific theories ------------------- 254
10.5. Problems about the role of prior knowledge and perception --------------------- 255
10.6. Problems about the nature of mathematical knowledge -------------------------- 258
10.7. Problems about aesthetic experience ------------------------------------------------- 259
10.8. Problems about kinds of representational systems -------------------------------- 260
10.9. Problems about rationality --------------------------------------------------------------- 261
10.10. Problems about ontology, reductionism, and phenomenalism ---------------- 262
10.11. Problems about scepticism ----------------------------------------------------------- 263
10.12. The problems of universals ----------------------------------------------------------- 264
10.13. Problems about free will and determinism ----------------------------------------- 266
10.14. Problems about the analysis of emotions ------------------------------------------ 267
10.15. Conclusion --------------------------------------------------------------------------------- 268
Epilogue --------------------------------------------------------------------------------------------- 272
Bibliography ---------------------------------------------------------------------------------------- 274
Postscript Do we need a hierarchy of meta-languages? --------------------------------- 285
Footnotes and Endnotes are at the end of each chapter.
Original pages x--xiii
Another book on how computers are going to change our lives? Yes, but this is more about computing than about computers, and it is more about how our thoughts may be changed than about how housework and factory chores will be taken over by a new breed of slaves.
Thoughts can be changed in many ways. The invention of painting and drawing permitted new thoughts in the processes of creating and interpreting pictures. The invention of speaking and writing also permitted profound extensions of our abilities to think and communicate. Computing is a bit like the invention of paper (a new medium of expression) and the invention of writing (new symbolisms to be embedded in the medium) combined. But the writing is more important than the paper. And computing is more important than computers: programming languages, computational theories and concepts -- these are what computing is about, not transistors, logic gates or flashing lights. Computers are pieces of machinery which permit the development of computing as pencil and paper permit the development of writing. In both cases the physical form of the medium used is not very important, provided that it can perform the required functions.
Computing can change our ways of thinking about many things, mathematics, biology, engineering, administrative procedures, and many more. But my main concern is that it can change our thinking about ourselves: giving us new models, metaphors, and other thinking tools to aid our efforts to fathom the mysteries of the human mind and heart. The new discipline of Artificial Intelligence is the branch of computing most directly concerned with this revolution. By giving us new, deeper, insights into some of our inner processes, it changes our thinking about ourselves. It therefore changes some of our inner processes, and so changes what we are, like all social, technological and intellectual revolutions.
I cannot predict all these changes, and certainly shall not try. The book is mainly about philosophical thinking, and its transformation in the light of computing. But one of my themes is that philosophy is not as limited an activity as you might think. The boundaries between philosophy and other theoretical and practical activities, notably education, software engineering, therapy and the scientific study of man, cannot be drawn as neatly as academic syllabuses might suggest. This blurring of disciplinary boundaries helps to substantiate a claim that a revolution in philosophy is intimately bound up with a revolution in the scientific study of man and its practical applications. Methodological excursions into the nature of science and philosophy therefore take up rather more of this book than I would have liked. But the issues are generally misunderstood, and I felt something needed to be done about that.
I think the revolution is also relevant to several branches of science and engineering not directly concerned with the study of man. Biology, for example, seems to be ripe for a computational revolution. And I don't mean that biologists should use computers to juggle numbers -- number crunching is not what this book is about. Nor is it what computing is essentially about. Further, it may be useful to try to understand the relationship between chemistry and physics by thinking of physical structures as providing a computer on which chemical programs are executed. But I am not so sure about that one, and will not pursue it.
Though fascinated by the intellectual problems discussed in the book, I would find it hard to justify spending public money working on them if it were not for the possibility of important consequences, including applications to education. But perhaps I should not worry: so much public money is wasted on futile research and teaching, to say nothing of incompetent public administration, ridiculous defence preparations, profits for manufacturers and purveyors of shoddy, useless or harmful goods (like cigarettes), that a little innocent academic study is marginal.
Early drafts of this book included lots of nasty comments on the current state of philosophy, psychology, social science, and education. I have tried to remove them or tone them down, since many were based on my ignorance and prejudice. In particular, my knowledge of psychology at the time of writing was dominated by lectures, seminars, textbooks and journal articles from the 1960s. Nowadays many psychologists are as critical as I could be of such psychology (which does not mean they will agree with my criticisms and proposed remedies). And Andreski's Social Science as Sorcery makes many of my criticisms of social science redundant.
I expect I shall be treading on many toes in my bridge-building comments. The fact that I have not read everything relevant will no doubt lead me into howlers. Well, that's life. Criticisms and corrections, published or private will be welcomed. (Except for arguments about whether I am doing philosophy or psychology or some kind of engineering. Demarcation disputes are usually a waste of time. Instead ask: are the problems interesting or important, and is some real progress made towards dealing with them?)
Since the book is aimed at a wide variety of readers with different backgrounds, it will be found by each of them to vary in clarity and interest from section to section. One person's banal oversimplification is another's mind-stretching novelty. Partly for this reason, the different chapters vary in style and overlap in content. The importance of the topic, and the shortage of informed discussion seemed to justify offering the book for publication despite its many flaws.
One thing that will infuriate some readers is my refusal to pay close attention to published arguments in the literature about whether machines can think, or whether people are machines of some sort. People who argue about this sort of thing are usually ignorant of developments in artificial intelligence, and their grasp of the real problems and possibilities in designing intelligent machines is therefore inadequate. Alternatively, they know about machines, but are ignorant of many old philosophical problems for mechanist theories of mind.
Most of the discussions (on both sides) contain more prejudice and rhetoric than analysis or argument. I think this is because in the end there is not much scope for rational discussion on this issue. It is ultimately an ethical question whether you should treat robots like people, or at least like cats, dogs or chimpanzees; not a question of fact. And that ethical question is the real meat behind the question whether artefacts could ever think or feel, at any rate when the question is discussed without any attempt to actually design a thinking or feeling machine.
When intelligent robots are made (with the help of philosophers), in a few hundred or a few thousand years time, some people will respond by accepting them as communicants and friends, whereas others will use all the old racialist arguments for depriving them of the status of persons. Did you know that you were a racialist?
But perhaps when it comes to living and working with robots, some people will be surprised how hard it is to retain the old disbelief in their consciousness, just as people have been surprised to find that someone of a different colour may actually be good to relate to as a person. For an unusually informative and well-informed statement of the racialist position concerning machines see Weizenbaum 1976. I admire his book, despite profound disagreements with it.
So, this book is an attempt to publicise an important, but largely unnoticed, facet of the computer revolution: its potential for transforming our ways of thinking about ourselves. Perhaps it will lead someone else, knowledgeable about developments in computing and Artificial Intelligence, to do a better job, and substantiate my claim that within a few years philosophers, psychologists, educationalists, psychiatrists, and others will be professionally incompetent if they are not well-informed about these developments.
Last updated: 4 Jun 2007. Reformatted: 15 Jul 2015
Original pages xiv--xvi
I have not always attributed ideas or arguments derived from others. I tend to remember content, not sources. Equally I'll not mind if others use my ideas without acknowledgement. The property-ethic dominates too much academic writing. It will be obvious to some readers that besides recent work in artificial intelligence the central ideas of Kant's Critique of Pure Reason have had an enormous influence on this book. Writings of Frege, Wittgenstein, Ryle, Austin, Popper, Chomsky, and indirectly Piaget have also played an important role. Many colleagues and students have helped me in a variety of ways: by provoking me to disagreement, by discussing issues with me, or by reading and commenting on earlier drafts of one or more chapters. This has been going on for a long time, so I am not sure that the following list includes everyone who has refined or revised my ideas, or given me new ones:
Frank Birch, Margaret Boden, Mike Brady, Alan Bundy, Max Clowes, Steve Draper, Gerald Gazdar, Roger Goodwin, Steven Hardy, Pat Hayes, Geoffrey Hinton, Laurie Hollings, Nechama Inbar, Robert Kowalski, John Krige, Tony Leggett, Barbara Lloyd, Christopher Longuet-Higgins, Alan Mackworth, Frank O'Gorman, David Owen, Richard Power, Julie Rutkowska, Alison Sloman, Jim Stansfield, Robin Stanton, Sylvia Weir, Alan White, Peter Williams.
Pru Heron, Jane Blackett, Judith Dennison, Maryanne McGinn and Pat Norton helped with typing and editing. Jane Blackett also helped with the diagrams.
The U.K. Science Research Council helped, first of all by enabling me to visit the Department of Artificial Intelligence in Edinburgh University for a year in 1972-3, and secondly by providing me with equipment and research staff for a three year project on computer vision at Sussex.
Bernard Meltzer was a very helpful host for my visit to Edinburgh, and several members of the department kindly spent hours helping me learn programming, and discussing computing concepts, especially Bob Boyer, J. Moore, Julian Davies and Danny Bobrow. Steve Hardy and Frank O'Gorman continued my computing education when I returned from Edinburgh. Several of my main themes concerning the status of mind can be traced back to interactions with Stuart Sutherland (e.g. see his 1970) and Margaret Boden. Her book Artificial Intelligence and Natural Man, like other things she has written, adopts a standpoint very similar to mine, and we have been talking about these issues over many years. So I have probably cribbed more from her than I know.
She also helped by encouraging me to put together various privately circulated papers when I had despaired of being able to produce a coherent, readable book. By writing her book she removed the need for me to give a detailed survey of current work in the field of A.I. Instead I urge readers to study her survey to get a good overview.
I owe my conversion to Artificial Intelligence, towards the end of 1969, to Max Clowes. I learnt a great deal by attending his lectures for undergraduates. He first pointed out to me that things I was trying to do in philosophical papers I was writing were being done better in A.I., and urged me to take up programming. I resisted for some time, arguing that I should first finish various draft papers and a book. Fortunately, I eventually realised that the best plan was to scrap them.
(I have not been so successful at convincing others that their intellectual investments are not as valuable as the new ideas and techniques waiting to be learnt. I suspect, in some cases, this is partly because they were allowed by the British educational system to abandon scientific and mathematical subjects and rigorous thinking at a fairly early age to specialise in arts and humanities subjects. I believe that the knowledge-explosion, and the needs of our complex modern societies, make it essential that we completely re-think the structure of formal education, from primary schools upwards: indefinitely continued teaching and learning at all ages in sciences, arts, humanities, crafts (including programming) must be encouraged. Perhaps that will be the best way to cope with unemployment produced by automation, and the like. But I'm digressing!).
Alison, Benjamin and Jonathan tolerated (most of the time) my withdrawal from family life for the sake of this book and other work. I did not wish to have children, but as will appear frequently in this book (e.g., in the chapter on learning about numbers), observing them and interacting with them has taught me a great deal. In return, my excursions into artificial intelligence and the topics of the book have changed my way of relating to children. I think I now understand their problems better, and have acquired a deeper respect for their intellectual powers.
The University of Sussex provided a fertile environment for the development of the ideas reported here, by permitting a small group of almost fanatical enthusiasts to set up a 'Cognitive Studies Programme' for interdisciplinary teaching and research, and providing us with an excellent though miniscule computing laboratory. But for the willingness of the computer to sit up with me into the early hours helping me edit, format, and print out draft chapters (and keeping me warm when the heating was off), the book would not have been ready for a long time to come.
I hope that, one day, even better computing facilities will be commonplace in primary schools, for kids to play with. After all, primary schools are more important than universities, aren't they?
I should like to thank John Spiers of Harvester Press for tolerating my tardiness in completing the book, and Gillian Orton for patiently preparing the typescript for he printer.
Original pages 1-21
1.1. Computers as toys to stretch our minds
Developments in science and technology are responsible for some of the best and some of the worst features of our lives. The computer is no exception. There are plenty of reasons for being pessimistic about its effects in the short run, in a society where the lust for power, profit, status and material possessions are dominant motives, and where those with knowledge -- for instance scientists, doctors and programmers -- can so easily manipulate and mislead those without.
Nevertheless I am convinced that the ill effects of computers can eventually be outweighed by their benefits. I am not thinking of the obvious benefits, like liberation from drudgery and the development of new kinds of information services. Rather, I have in mind the role of the computer, and the processes which run on it, as a new medium of self-expression, perhaps comparable in importance to the invention of writing.
Think of it like this. From early childhood onwards we all need to play with toys, be they bricks, dolls, construction kits, paint and brushes, words, nursery rhymes, stories, pencil and paper, mathematical problems, crossword puzzles, games like chess, musical instruments, theatres, scientific laboratories, scientific theories, or other people. We need to interact with all these playthings and playmates in order to develop our understanding of ourselves and our environment that is, in order to develop our concepts, our thinking strategies, our means of expression and even our tastes, desires and aims in life. The fruitfulness of such play depends in part on how complex the toy and the processes it generates, and how rich the interaction between player and toy are.
A modern digital computer is perhaps the most complex toy ever created by man. It can also be as richly interactive as a musical instrument. And it is certainly the most flexible: the very same computer may simultaneously be helping an eight year old child to generate pictures on a screen and helping a professional programmer to understand the unexpected behaviour of a very complex program he has designed. Meanwhile other users may be attempting to create electronic music, designing a program to translate English into French, testing a program which analyses and describes pictures, or simply treating the computer as an interactive diary. A few old-fashioned scientists may even be doing some numerical computations.
Unlike pet animals and other people (also rich, flexible and interactive), computers are toys designed by people. So people can understand how they work. Moreover the designs of the programs which run on them can be and are being extended by people, and this can go on indefinitely. As we extend these designs, our ability to think and talk about complex structures and processes is extended. We develop new concepts, new languages, new ways of thinking. So we acquire powerful new tools with which to try to understand other complex systems which we have not designed, including systems which have so far largely resisted our attempts at comprehension: for instance human minds and social systems. Despite the existence of university departments of psychology, sociology, education, politics, anthropology, economics and international relations, it is clear that understanding of these domains is currently at a pathetically inadequate level: current theories don't yet provide a basis for designing satisfactory educational procedures, psychological therapies, or government policies.
But apart from the professionals, ordinary people need concepts, symbolisms, metaphors and models to help them understand the world, and in particular to help them understand themselves and other people. At present much of our informal thinking about people uses unsatisfactory mechanistic models and metaphors, which we are often not even aware of using. For instance even people who strongly oppose the application of computing metaphors to mental processes, on the grounds that computers are mere mechanisms, often unthinkingly use much cruder mechanistic metaphors, for instance 'He needed to let off steam', I was pulled in two directions at once, but the desire to help my family was stronger', 'His thinking is stuck in a rut', 'The atmosphere in the room was highly charged'. Opponents of the spread of computational metaphors are in effect unwittingly condemning people to go on living with hydraulic, clock-work, and electrical metaphors derived from previous advances in science and technology.
To summarise so far: it can be argued that computers, or, to be more precise, combinations of computers and programs, constitute profoundly important new toys which can give us new means of expression and communication and help us create an ever-increasing new stock of concepts and metaphors for thinking about all sorts of complex systems, including ourselves.
I believe that not only psychology and social sciences but also biology and even chemistry and physics can be transformed by attempting to view complex processes as computational processes, including rich information flow between sub-processes and the construction and manipulating of symbolic structures within processes. This should supersede older paradigms, such as the paradigm which represents processes in terms of equations or correlations between numerical variables.
This paradigm worked well for a while in physics but now seems to dominate, and perhaps to strangle, other disciplines for which it is irrelevant. Apart from computing science, linguistics and logic seem to be the only sciences which have sharply and successfully broken away from the paradigm of 'variables, equations and correlations'. But perhaps it is significant that the last two pretend not to be concerned with processes, only with structures. This is a serious limitation, as I shall try to show in later chapters.
1.2. The Revolution in Philosophy
Well, suppose it is true that developments in computing can lead to major advances in the scientific study of man and society: what have these scientific advances to do with philosophy?
The very question presupposes a view of philosophy as something separate from science, a view which I shall attempt to challenge and undermine later, since it is based both on a misconception of the aims and methods of science and on the arrogant assumption by many philosophers that they are the privileged guardians of a method of discovering important non-empirical truths.
But there is a more direct answer to the question, which is that very many of the problems and concepts discussed by philosophers over the centuries have been concerned with processes, whereas philosophers, like everybody else, have been crippled in their thinking about processes by too limited a collection of concepts and formalisms. Here are some age-old philosophical problems explicitly or implicitly concerned with processes. How can sensory experience provide a rational basis for beliefs about physical objects? How can concepts be acquired through experience, and what other methods of concept formation are there? Are there rational procedures for generating theories or hypotheses? What is the relation between mind and body? How can non-empirical knowledge, such as logical or mathematical knowledge, be acquired? How can the utterance of a sentence relate to the world in such a way as to say something true or false? How can a one-dimensional string of words be understood as describing a three-dimensional or multi-dimensional portion of the world? What forms of rational inference are there? How can motives generate decisions, intentions and actions? How do non-verbal representations work? Are there rational procedures for resolving social conflicts?
There are many more problems in all branches of philosophy concerned with processes, such as perceiving, inferring, remembering, recognising, understanding, learning, proving, explaining, communicating, referring, describing, interpreting, imagining, creating, deliberating, choosing, acting, testing, verifying, and so on. Philosophers, like most scientists, have an inadequate set of tools for theorising about such matters, being restricted to something like common sense plus the concepts of logic and physics. A few have clutched at more recent technical developments, such as concepts from control theory (e.g. feedback) and the mathematical theory of games (e.g. payoff matrix), but these are hopelessly deficient for the tasks of philosophy, just as they are for the task of psychology.
The new discipline of artificial intelligence explores ways of enabling computers to do things which previously could be done only by people and the higher mammals (like seeing things, solving problems, making and testing plans, forming hypotheses, proving theorems, and understanding English). It is rapidly extending our ability to think about processes of the kinds which are of interest to philosophy. So it is important for philosophers to investigate whether these new ideas can be used to clarify and perhaps helpfully reformulate old philosophical problems, re-evaluate old philosophical theories, and, above all, to construct important new answers to old questions. As in any healthy discipline, this is bound to generate a host of new problems, and maybe some of them can be solved too.
I am prepared to go so far as to say that within a few years, if there remain any philosophers who are not familiar with some of the main developments in artificial intelligence, it will be fair to accuse them of professional incompetence, and that to teach courses in philosophy of mind, epistemology, aesthetics, philosophy of science, philosophy of language, ethics, metaphysics, and other main areas of philosophy, without discussing the relevant aspects of artificial intelligence will be as irresponsible as giving a degree course in physics which includes no quantum theory. Later in this book I shall elucidate some of the connections. Chapter 4, for example, will show how concepts and techniques of philosophy are relevant to AI and cognitive science.
Philosophy can make progress, despite appearances. Perhaps in future the major advances will be made by people who do not call themselves philosophers.
After that build-up you might expect a report on some of the major achievements in artificial intelligence to follow. But that is not the purpose of this book: an excellent survey can be found in Margaret Boden's book. Artificial Intelligence and Natural Man, and other works mentioned in the bibliography will take the interested reader into the depths of particular problem areas. (Textbooks on AI will be especially useful for readers wishing to get involved in doing artificial intelligence.)
My main aim in this book is to re-interpret some age-old philosophical problems, in the light of developments in computing. These developments are also relevant to current issues in psychology and education. Most of the topics are closely related to frontier research in artificial intelligence, including my own research into giving a computer visual experiences, and analysing motivational and emotional processes in computational terms.
Some of the philosophical topics in Part One of the book are included not only because I think I have learnt important things by relating them to computational ideas, but also because I think misconceptions about them are among the obstacles preventing philosophers from accepting the relevance of computing. Similar misconceptions may confuse workers in AI and cognitive science about the nature of their discipline.
For instance, the chapters on the aims of science and the relations between science and philosophy attempt to undermine the wide-spread assumption that philosophers are doing something so different from scientists that they need not bother with scientific developments and vice versa. Those chapters are also based on the idea that developments in science and philosophy form a computational process not unlike the one we call human learning.
The remaining chapters, in Part Two, contain attempts to use computational ideas in discussing some problems in metaphysics, philosophy of mind, epistemology, philosophy of language and philosophy of mathematics. I believe that further analysis of the nature of number concepts and arithmetical knowledge in terms of symbol-manipulating processes could lead to profound developments in primary school teaching, as well as solving old problems in philosophy of mathematics.
In the remainder of this chapter I shall attempt to present, in bold outline, some of the main themes of the computer revolution, followed by a brief definition of ``Artificial Intelligence''. This will help to set the stage for what follows. Some of the themes will be developed in detail in later chapters. Others will simply have to be taken for granted as far as this book is concerned. Margaret Boden's book and more recent textbooks on AI fill most of the gaps.
1.3. Themes from the Computer Revolution
1. Computers are commonly viewed as elaborate numerical calculators or at best as devices for blindly storing and retrieving information or blindly following sequences of instructions programmed into them. However, they can be more accurately viewed as an extension of human means of expression and communication, comparable in importance to the invention of writing. Programs running on a computer provide us with a medium for thinking new thoughts, trying them out, and gradually extending, deepening and clarifying them. This is because, when suitably programmed, computers are devices for constructing, manipulating, analysing, interpreting and transforming symbolic structures of all kinds, including their own programs.
2. Concepts of 'cause', law', and 'mechanism', discussed by philosophers, and used by scientists, are seriously impoverished by comparison with the newly emerging concepts.
The old concepts suffice for relatively simple physical mechanisms, like clocks, typewriters, steam engines and unprogrammed computers, whose limitations can be illustrated by their inability to support a notion of purpose.
By contrast, a programmed computer may include representations of itself, its actions, possible futures, reasons for choosing, and methods of inference, and can therefore sometimes contain purposes which generate behaviour, as opposed to merely containing physical structures and processes which generate behaviour. So biologists and psychologists who aim to banish talk of purposes from science, thereby ignore some of the most important new developments in science. So do philosophers and psychologists who use the existence of purposive human behaviour to 'disprove' the possibility of a scientific study of man.
3. Learning that a computer contains a certain sub-program enables you to explain some of the things it can do, but provides no basis for predicting what it always or frequently does, since that will depend on a large number of other factors which determine when this sub-program is executed and the environment in which it is executed. So a scientific investigation of computational processes need not be primarily a search for laws so much as an attempt to describe and explain what sorts of things are and are not possible. A central form of question in science and philosophy is 'How is so and so possible?' Many scientists, especially those studying people and social systems, mislead themselves and their students into thinking that science is essentially a search for laws and correlations, so that they overlook the study of possibilities. Linguists (especially since Chomsky) have grasped this point, however. (This topic is developed at length in chapter 2.)
4. Similarly there is a wide-spread myth that the scientific study of complex systems requires the use of numerical measurements, equations, calculus, and the other mathematical paraphernalia of physics. These things are useless for describing or explaining the important aspects of the behaviour of complex programs (e.g. a computer, operating system, or Winograd's program described in his book Understanding Natural Language).
Instead of equations and the like, quite new non-numerical formalisms have evolved in the form of programming languages, along with a host of informal concepts relating the languages, the programs expressed therein, and the processes they generate. Many of these concepts (e.g. parsing, compiling, interpreting, pointer, mutual recursion, side-effect, pattern matching) are very general, and it is quite likely that they could be of much more use to students of biology, psychology and social science than the kinds of numerical mathematics they are normally taught, which are of limited use for theorising about complex interacting structures. Unfortunately although many scientists dimly grasp this point (e.g. when they compare the DNA molecule with a computer program) they are often unable to use the relationship: their conception of a computer program is limited to the sorts of data-processing programs written in low-level languages like Fortran or Basic.
5. It is important to distinguish cybernetics and so-called 'systems theory' from this broader science of computation, for the former are mostly concerned with processes involving relatively fixed structures in which something quantifiable (e.g. money, energy, electric current, the total population of a species) flows between or characterises substructures. Their formalisms and theories are too simple to say anything precise about the communication of a sentence, plan or problem, or to represent the process of construction or modification of a symbolic structure which stores information or abilities.
Similarly, the mathematical theory of information, of Shannon and Weaver, is mostly irrelevant, although computer programs are often said to be information-processing mechanisms. The use of the word 'information' in the mathematical theory has proved to be utterly misleading. It is not concerned with meaning or content or sense or connotation or denotation, but with probability and redundancy in signals. If more suitable terminology had been chosen, then perhaps a horde of artists, composers, linguists, anthropologists, and even philosophers would not have been misled.
I am not denying the importance of the theory to electronic engineering and physics. In some contexts it is useful to think of communication as sending a signal down a noisy line, and understanding as involving some process of decoding signals. But human communication is quite different: we do not decode, we interpret, using enormous amounts of background knowledge and problem-solving abilities. That is, we map one class of structures (e.g. 2-D images), into another class (e.g. 3-D scenes). Chapter 9 elaborates on this, in describing work in computer vision. The same is true of artificial intelligence programs which understand language. Information theory is not concerned with such mappings.
6. One of the major new insights is that computational processes may be markedly decoupled from the physical processes of the underlying computer. Computers with quite different basic components and architecture may be equivalent in an important sense: a program which runs on one of them can be made to run on any other either by means of a second program which simulates the first computer on the second, or by means of a suitable compiler or interpreter program which translates the first program into a formalism which the second computer can execute. So a program may run on a virtual machine.
Differences in size can be got round by attaching peripheral storage devices such as magnetic discs or tapes, leaving only differences in speed.
So all modern digital computers are theoretically equivalent, and the detailed physical structure and properties of a computer need not constrain or determine the symbol-manipulating and problem-solving processes which can run on it: any constraints, except for speed, can be overcome by providing more storage and feeding in new programs. Similarly, the programs do not determine the computers on which they can run.
7. Thus reductionism is refuted. For instance, if biological processes are computational processes running on a physico-chemical computer, then essentially the same processes could, with suitable re-programming, run on a different sort of computer. Equally, the same computer could permit quite different computations: so the nature of the physical world need not determine biological processes. Just as the electronic engineers who build and maintain a computer may be quite unable to describe or understand some of the programs which run on it, so may physicists and chemists lack the resources to describe, explain or predict biological processes. Similarly psychology need not be reducible to physiology, nor social processes to psychological ones. To say that wholes may be more than the sum of their parts, and that qualitatively new processes may 'emerge' from old ones, now becomes an acceptable part of the science of computation, rather than old-fashioned mysticism. Many anti-reductionists have had this thought prior to the development of computing, but have been unable to give it a clear and indisputable foundation.
8. There need not be only two layers: programs and physical machine. A suitably programmed computer (e.g. a computer with a compiler program in it), is itself a new computer a new 'virtual machine' which in turn may be programmed so as to support new kinds of processes. Thus a single process may involve many layers of computations, each using the next lower layer as its underlying machine. But that is not all. The relations may sometimes not even be hierarchically organised, for instance if process A forms part of the underlying machine for process B and process B forms part of the underlying machine for process A. Social and psychological, psychological and physiological processes, seem to be related in this mutually supportive way. Chapters 6 and 9 present some examples. The development of good tools for thinking about a system composed of multiple interlocking processes is only just beginning. Systems of differential equations and the other tools of mathematical physics are worse than useless, for the attempt to use them can yield quite distorted descriptions of processes involving intelligent systems, and encourage us to ask unfruitful questions.
9. Philosophers sometimes claim that it is the business of philosophy only to analyse concepts, not to criticise them. But constructive criticism is often needed and in many cases the task will not be performed if philosophers shirk it. An important new task for philosophers is constructively critical analysis of the concepts and underlying presuppositions emerging from computer science and especially artificial intelligence. Further, by carefully analysing the mismatch between some of our very complicated ordinary concepts like goal, decide, infer, perceive, emotion, believe, understand, and the models being developed in artificial intelligence, philosophers may help to counteract unproductive exaggerated claims and pave the way for further developments. They will be rewarded by being helped with some of their philosophical problems.
10. For example, the computational metaphor, paradoxically, provides support for a claim that human decisions are not physically or physiologically determined, since, as explained above, if the mind is a computational process using the brain as a computer then it follows that the brain does not constrain the range of mental processes, any more than a computer constrains the set of algorithms that can run on it. It can be more illuminating to think of the program (or mind) as constraining the physical processes than vice versa.
Moreover, since the state of a computation can be frozen, and stored in some non-material medium such as a radio signal transmitted to a distant planet, and then restarted on a different computer, we see that the hitherto non-scientific hypothesis that people can survive bodily death, and be resurrected later on, acquires a new lease of life. Not that this version is likely to please theologians, since it no longer requires a god.
11. Recent attempts to give computers perceptual abilities seem to have settled the empiricist/rationalist debate by supporting Immanuel Kant's claim that no experiencing is possible without information-processing (analysis, comparison, interpretation of data) and that no information-processing is possible without pre-existing knowledge in the form of symbol-manipulating procedures, data-structures, and quite specific descriptive abilities. (This topic is elaborated in chapter 9.)
Shallow philosophical, linguistic and psychological disputes about innate or non-empirical knowledge are being replaced by much harder and deeper explorations of exactly what pre-existing knowledge is required, or sufficient, for particular types of empirical and non-empirical learning. What knowledge of two- and three-dimensional geometry and of physics does a robot need in order to be able to interpret its visual images in terms of tables, chairs and dishes to be carried to the sink? What kind of knowledge about its own symbolisms and symbol-manipulating procedures will a baby robot need in order to stumble upon and understand the discovery that counting a row of buttons from left to right necessarily produces the same result as counting from right to left, if no mistakes occur? (More on this sort of thing in the chapter on learning about numbers.)
Similarly, philosophical debates about the possibility of
'synthetic apriori' knowledge dissolve in the light of new insights
into the enormous variety of ways in which a computational system
(including a human society?) may make
inferences, and perhaps discover necessary truths about the
capabilities and limitations of its current stock of programs.
For an example see the book by Sussman about a program
which learns to build better programs for stacking blocks
by analysing why initial versions go wrong.
(G.J. Sussman, A Computational Model of Skill Acquisition, American Elsevier, 1975.)
Epistemology, developmental psychology, and the history of ideas (including science and art) may be integrated in a single computational framework. The chapters on the aims of science and on number concepts are intended as a small step in this direction.
12. One of the bigger obstacles to progress in science and philosophy is often our inability to tell when we lack an explanation of something. Before Newton, people thought they understood why unsupported objects fell. Similarly, we think practice explains learning, familiarity explains recognition, desire explains action. Philosophers often assume that if you have experienced instances and non-instances of some concept, then this 'ostensive definition' suffices to explain how you could have learnt this concept. So our experience of seeing blue things and straight lines is supposed to explain how we acquire the concepts blue and straight. As for how the relevant aspects of instances and non-instances are noticed, related to one another and to previous experiences, and how the irrelevant aspects are left out of consideration the question isn't even asked. (Winston asked it, and gave some answers to it in the form of a primitive learning program: see his 1975.) Psychologists don't normally ask these questions either: having been indoctrinated with the paradigm of dependent and independent variables, they fail to distinguish a study of the circumstances in which some behaviour does and does not occur, from a search for an explanation of that behaviour.
People assume that if a person or animal wants something, then this, together with relevant beliefs, suffices to explain the resulting actions. But no decent theory is offered to explain how desires and beliefs are capable of generating action, and in particular no theory of how an individual finds relevant beliefs in his huge store of information, or how conflicting motives enter into the process, or how beliefs, purposes, skills, etc. are combined in the design of an action (e.g. an utterance) suited to the current situation. The closest thing to a theory in the minds of most people is the model of desires as physical forces pushing us in different directions, with the strongest force winning. The mathematical theory of games and decisions is a first crude attempt to improve on this, but is based on the false assumptions that people start with a well-defined set of alternative actions when they take decisions.
Work in artificial intelligence on programs which formulate and execute plans is beginning to unravel some of the intricacies of such processes. My chapter on aspects of the mechanism of mind will discuss some of the problems. (Chapter 6).
By trying to turn our explanations and theories into designs for working systems, we soon discover their poverty. The computer, unlike academic colleagues, is not convinced by fine prose, impressive looking diagrams or jargon, or even mathematical equations. If your theory doesn't work then the behaviour of the system you have designed will soon reveal the need for improvement. often errors in your design will prevent it behaving at all.
Books don't behave. We have long needed a medium for expressing theories about behaving systems. Now we have one, and a few years of programming explorations can resolve or clarify some issues which have survived centuries of disputation.
Progress in philosophy (and psychology) will now come from those who take seriously the attempt to design a person. I propose a new criterion for evaluating philosophical writings: could they help someone designing a mind, a language, a society or a world?
The same criterion is relevant to theorising in psychology. The difference is that philosophy is not so much concerned with finding the correct explanation of actual human behaviour. Its aims are more general. For more on the difference see chapters 2 and 3.
13. A frequently repeated discovery, using the new methodology, is that what seemed simple and easy to explain turns out to be very complex, requiring sophisticated computational resources, for instance: seeing a dot, remembering a word, learning from an example, improving through practice, recognising a familiar shape, associating two ideas, picking up a pencil. Of course, it may be that for all these achievements there are simple explanations, of kinds hitherto quite unknown. But at least we have learnt that we don't know them, and that is real progress. This also teaches a new respect for the intellects of infants and other animals. How does a bee manage to alight on a flower without crashing into it?
14. There are some interesting implications of the points made in 7 and 8 above. I mentioned that two computational processes may be mutually supportive. Similarly, two procedures may contain each other as parts, two information structures may contain each other as parts. More generally, a whole system may be built up from large numbers of mutually recursive procedures and data-structures, which interlock so tightly that no element can be properly defined except in terms of the whole system. (Recursive rules in formal grammars illustrate the same idea.) Since the system cannot be broken down hierarchically into parts, then parts of those parts, until relatively simple concepts and facts are reached, it follows that anyone learning about the system has to learn many different interrelated things in parallel, tolerating confusion, oversimplifications, inaccuracies, and constantly altering what has previously been learnt in the light of what comes later.
So the process of learning a complex interlocking network of circular concepts, theories and procedures may have much in common with the task of designing one.
If all this is correct it not only undermines philosophical attempts to perform a logical analysis of our concepts in terms of ever more primitive ones (as Wittgenstein, for example, assumed possible in his Tractatus Logico Philosophicus), it also has profound implications for the psychology of learning and for educational practice. It seems to imply that learning may be a highly creative process, that cumulative educational programmes may be misguided, and that teachers should not expect pupils to get things right while they are in the midst of learning a collection of mutually recursive concepts. This theme will be illustrated in more detail in the chapter on learning about numbers.
(One implication is that this book cannot be written in such a way as to introduce readers to the main ideas one at a time in a clear and accurate way. Readers who are new to the system of concepts will have to revisit different portions of the book frequently. No author has the right to expect this. The book is therefore quite likely to fail to communicate.)
15. Much of what is said in this book simply reports common sense. That is, it attempts to articulate much of the sound intuitive knowledge we have picked up over years of interacting with the physical world and with other people.
Making common sense explicit is the goal of much philosophising. Common sense should not be confused with common opinions, namely the beliefs we can readily formulate when asked: these are often false over-generalisations or merely the result of prejudice. Common sense is a rich and profound store of information, not about laws, but about what people are capable of doing, thinking or experiencing.
But common sense, like our knowledge of the grammar of our native language, is hard to get at and articulate, which is one reason why so much of philosophy, psychology and social science is vapid, or simply false.
Philosophers have been struggling for centuries to develop techniques for articulating common sense and unacknowledged presuppositions, such as the techniques of conceptual analysis and the exploration of paradoxes. Artificial intelligence provides an important new tool for doing this. It helps us find our mistakes quickly. One reason for this is that attempts to make computers understand what we say soon break down if we haven't learnt to articulate in the programs the presuppositions and rich conceptual structures which we use in understanding such things. (See Abelson, 'The structure of belief systems', and Schank & Abelson, 1977.)
Further, when you've designed a program whose behaviour is meant to exemplify some familiar concept, such as learning, perceiving, conversing, or achieving a goal, then in trying to interact with the program and in experiencing its behaviour it often happens that you come to realise that it does not really exemplify your concept after all, and this may help you to pin down features of the concept, essential to its use, which you had not previously noticed. So artificial intelligence contributes to conceptual analysis. (The interaction is two-way.)
16. Of course, merely imagining the program's behaviour would often suffice: running the program isn't necessary in principle. But one of the sad and yet exhilarating facts most programmers soon learn is that it is hard to be sufficiently imaginative to anticipate the kinds of behaviour one's program can produce, especially when it is a complex system capable of generating millions of different kinds of processes depending on what you do with it. It is a myth that programs do just what the programmer intended them to do, especially when they are interacting with compilers, operating systems and hardware designed by someone else. The result is often behaviour that nobody planned and nobody can understand.
Thus new possibilities are discovered. Such discoveries may serve the same role as thought-experiments have often done in physics. So computational experiments may help to extend common sense as well as helping us to analyse it.
17. One of the things I have been trying to do is undermine the conflict between those who claim that a scientific study of man is possible and those who claim it isn't. Both sides are usually adopting a quite mistaken view of the essence of science. Bad philosophical ideas seem to have a habit of pervading a whole culture (like the supposed dichotomy between the emotional, intuitive aspects of people and the cognitive, intellectual, or rational aspects -- a dichotomy I have tried to undermine elsewhere).
The chapter on the aims of science attempts to correct widespread but mistaken views about the nature of science. I first became aware of the mistakes under the influence of linguistics and artificial intelligence.
18. One of the main themes of the revolution is that the pure scientist needs to behave like an engineer: designing and testing working theories. The more complex the processes studied, the closer the two must become. Pure and applied science merge. And philosophers need to join in.
19. I'll end with one more wildly speculative remark. Social systems are among the most complex computational processes created by man (whether intentionally or not). Most of the people currently charged with designing, maintaining, improving or even studying such processes are almost completely ignorant of the concepts, and untrained in the skills, required for thinking about very complex interacting processes. Instead they mess about with variables (on ordinal, interval or ratio scales), looking for correlations between them, convinced that measurement and laws are the stuff of science, without recognizing that such techniques are merely useful stop-gaps for dealing with phenomena you don't yet understand. In years to come, our willingness to trust these politicians, civil servants, economists, educationalists and the like with the task of managing our social system will look rather laughable. I am not suggesting that programmers should govern us. Rather, I venture to suggest that if everyone were allowed to play with computers from childhood, not only would education become much more fun and stretch our minds much further, but people might be a lot better equipped to face many of the tasks which currently defeat us because we don't know how to think about them. Computer 'experts' would find it harder to exploit us.
The best way to answer this question is to look at the aims of A.I., and some of the methods for achieving those aims, and to show how the subject is decomposable into sub-domains and related to other disciplines. This would require a whole book, which is not my current purpose. So I'll give an incomplete answer by describing and commenting on some of the aims. AI is not just the attempt to make machines do things which when done by people are called ``intelligent''. It is much broader and deeper than this. For it includes the scientific and philosophical aims of understanding as well as the engineering aim of making.
The aims of Artificial Intelligence
1. Theoretical analysis of possible
effective explanations of
2. Explaining human abilities.
3. Construction of intelligent artefacts.
Comments on the aims:
I.e. a great deal of A.I. research is highly 'domain-specific', and amounts to an attempt to explicitly formulate knowledge people already use unconsciously in ordinary life or specialised activities. This is closely related to conceptual analysis as practised by linguists and philosophers. (See Chapter 4.)
The primary aim of my research is to understand aspects of the human mind. Different people will be interested in different aspects, and many will not be interested in the aspects I have chosen: scientific creativity, decision making, visual perception, the use of verbal and non-verbal symbolisms, and learning of elementary mathematics. At present I can only report fragmentary progress. Whether it is called philosophy, psychology, computing science, or anything else doesn't really interest me. The methods of all these disciplines are needed if progress is to be made. It may be that the human mind is too complex to be understood by the human mind. But the desire to attempt the impossible seems to be one of its persistent features.
The remaining chapters, apart from chapter 10 should be readable in any order. On the whole, people knowledgeable about philosophy and ignorant of computing will probably find chapters 2 to 5 easier than the following chapters. People interested in trying to understand how people work, and not so concerned with abstract methodological issues, may find chapters 2 to 5 tedious (or difficult?), and should start with Part Two, though they'll not be able to follow all the methodological asides, which refer back to earlier chapters.
(1) I write 'program' not 'programme' since the former is a technical term referring to a collection of definitions, instructions and information expressed in a precise language capable of being interpreted by a computer. For more details see J. Weizenbaum, Computer Power and Human Reason. There is much in this book that I disagree with, but it is well worth reading, and may be a useful antidote to some of my excesses.
(2) A compiler is a program which translates programs from one programming language into another. E.g. an ALGOL compiler may translate ALGOL programs into the 'machine code' of a particular computer.
(3) Apparently Hegel anticipated some of these ideas. His admirers might advance their understanding of his problems by turning to the study of computation.
Original pages 22-62
If we ask scientists what science is and what its aims are, we get a confusing variety of answers.
Whom should we believe? Do scientists really know what they are doing, or are they perhaps as confused about their aims and methods as the rest of us? I suggest that it is as hard for a scientist to characterise the aims and methods of science in general as it is for normal persons to characterise the grammatical rules governing their own use of language. But I am going to stick my neck out and try.
If we are to understand the nature of science, we must see it as an activity and achievement of the human mind alongside others, such as the achievements of children in learning to talk and to cope with people and other objects in their environment, and the achievements of non-scientists living in a rich and complex world which constantly poses problems to be solved. Looking at scientific knowledge as one form of human knowledge, scientific understanding as one form of human understanding, scientific investigation as one form of human problem-solving activity, we can begin to see more clearly what science is, and also what kind of mechanism the human mind is.
I suggest that no simple slogan or definition, such as can be found in textbooks of science or philosophy can capture its aims. For instance, I shall try to show that it is grossly misleading to characterise science as a search for laws. Science is a complex network of different interlocking activities with multiple practical and theoretical aims and a great variety of methods. I shall try to describe some of the aims and their relationships. Oversimple characterisations, by both scientists and philosophers, have led to unnecessary and crippling restrictions on the activities of some would-be scientists, especially in the social and behavioural sciences, and to harmfully rigid barriers between science and philosophy.
By undermining the slogan that science is the search for laws, and subsidiary slogans such as that quantification is essential, that scientific theories must be empirically refutable, and that the methods of philosophers cannot serve the aims of scientists, I shall try to liberate some scientists from the dogmas indoctrinated in universities and colleges. I shall also try in later chapters to show philosophers how they can contribute to the scientific study of man, thereby escaping from the barrenness and triviality complained of so often by non-philosophers and philosophy students.
An important reason for studying the aims and methods of science is that it may give us insights into the learning processes of children, and help us design machines which can learn. Equally, the latter project should help us understand science. A side-effect of my argument is to undermine some old philosophical distinctions and pour cold water on battles which rage around them like the distinction between subjectivity and objectivity, the distinction between science and philosophy and the battles between empiricists and rationalists.
My views have been powerfully influenced by the writings of Karl Popper. However, several major points of disagreement with him will emerge.
Whether the third aim makes sense (and many scientists and philosophers would dispute this) depends on whether it is possible to derive values and norms from facts. I shall not discuss it as it is not relevant to the main purposes of this book. The second kind of aim will not be given much attention either, except when relevant to discussions of the first kind of aim, on which I shall concentrate.
These aims are not restricted to science. We all, including infants and children, aim to extend our knowledge and understanding: science is unique only in the degree of rigour, system and co-operation between individuals involved in its methods. For the present, however, I shall not explore the peculiarities of science, since what it has in common with other forms of acquisition of knowledge has been too long neglected, and it is the common features I want to describe.
In particular, notice that one cannot have the aim of extending one's knowledge unless one presupposes that one's knowledge is incomplete, or perhaps even includes mistakes. This means that pursuing science requires systematic self-criticism in order to find the gaps and errors. This distinguishes both science and perhaps the curiosity of young children from some other belief systems, such as dogmatic theological systems and political ideologies. (See chapter 6 for the role of self-criticism in intelligence.) But it does not distinguish science from philosophy. Let us now examine the factual aims of science more.closely.
NOTE: I would now (since about 2002) express the aim of 'extending knowledge of what sorts of things are possible' in terms of 'extending the ontology' we use. This is also part of the process of child development, e.g. as illustrated in this presentation:
'Ontology extension' in evolution and in development, in animals and machines.
And in: http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#glang
Evolution of minds and languages.
What evolved first and develops first in children:
Languages for communicating, or languages for thinking (Generalised Languages: GLs)?
A similar distinction pervades the writings of Karl Popper, though he would disagree with some of the things I say below about (1.a). Different branches of science tend to stress one or other of these aims, though both aims are usually present to some extent. For instance, physics is more concerned with aim (1.a), studying the form of the world, whereas astronomy is perhaps more concerned with (1.b), studying the contents.
Geology, geography, biology, anthropology, human history, sociology, and some kinds of linguistics tend to be more concerned with (1.b), i.e. with learning about the particular contents of particular parts of the universe. Chemistry, some branches of biology, economics and psychology attempt to investigate truths not so restricted in scope. In the jargon of philosophers, (1.a) is concerned with universals, (l.b) with particulars.
However, the two scientific aims are very closely linked. One cannot discover what sorts of things are possible, nor test explanatory theories, except by discovering particular facts about what actually exists or occurs. Conversely, one cannot really understand particular objects, events, processes, etc., except insofar as one classifies and explains them in the light of more general knowledge about what kinds of things there can be and how or why. These two aims are closely linked in all forms of learning about the world, not only in science. The study of form and the study of content go hand in hand. (This must be an important factor in the design of intelligent machines.)
I have characterised these aims in a dynamic form: the aim is to extend knowledge, to go on learning. Some might say that the aim is to arrive at some terminal state when everything is known about the form and content of the world, or at least the form. There are serious problems about whether this suggestion makes sense: for example how could one tell that this goal had been reached? But I do not wish to pursue the matter. For the present, it is sufficient to note that it makes sense to talk of extending knowledge, that is removing errors and filling gaps, whether or not any final state of complete knowledge is possible. Some of the criteria for deciding what is an extension or improvement will be mentioned later.
Many philosophers of science have found it hard to explain the sense in which science makes progress, or is cumulative. (E.g. Kuhn (1962), last chapter.) This is because they tend to think of science as being mainly concerned with laws; and supposed laws are constantly being refuted or replaced by others. Very little seems to survive. But if we see science as being also concerned with knowledge of what is possible, then it is obviously cumulative. For a single instance demonstrates a new possibility and, unlike a law, this cannot be refuted by new occurrences, even if the possibility is re-described from time to time as the language of scientists evolves.
Hypotheses about the limits of possibilities (laws) lack this security, for they are constantly subject to revision as the boundaries are pushed further out, by newly discovered (or created) possibilities. Explanations of possibilities and their limits frequently need to be refined or replaced, for the same reason. But this is all a necessary part of the process of learning and understanding more about what is possible in the world. (This is true of child development too.) It is an organic, principled growth. Let us now look more closely at aim (1.a), the aim of extending knowledge of the form of the world.
All the following types of learning will ultimately have to be catered for in intelligent machines.
If forced to summarise all this in a single slogan, one could say: A major aim of science is to find out what sorts of things are and are not possible in the world, and to explain how and why.
A similar aim must motivate intelligent learning machines.
Though too short to be clear, this may be a useful antidote to more common slogans stressing the discovery and explanation of laws and regularities. Such slogans lead to an excessive concern with prediction, control and testing, topics mainly relevant to subgoals (d) to (g), while insufficient attention is paid to the more fundamental aims (a) to (c), especially in psychology and social science. The result is often misguided research, theorising and teaching.
I shall say more about these three fundamental aims later. The next two sections contain further general discussion of the relations between these seven interpretative aims, and the previously mentioned historical and technological aims of science.
This possibility refutes the theory that water is a chemical element and corroborates the alternative hypothesis that all water is composed of hydrogen and oxygen, and also more general theories about possible kinds of transformations of matter. Similarly, although an 'historical' biologist may be interested in recording, for a fascinated public, the flora and fauna of a foreign isle, or the antics of a particularly intelligent chimpanzee, the 'interpretative' biologist is interested only insofar as they illustrate something, such as what kinds of plants and animals can exist (or can exist in certain conditions), or what kinds of behaviour are possible for a chimpanzee, or for some other class containing the animal in question.
In short, the interpretative scientist studies the form of the world, using the contents only as evidence, whereas the historical scientist simply studies the contents. There is no reason why any one science, or scientist, should be classified entirely as interpretative, or entirely as historical. Different elements may intermingle in one branch of science. For instance, a linguist studying a particular dialect is an interpretative scientist insofar as he is not concerned merely to record the actual set of sentences uttered by certain speakers of that dialect, but to characterise the full range of sentences that would or could be intelligible to an ordinary speaker of that dialect, namely, a range of possibilities.
However, insofar as he is interested merely in finding out exactly what dialect is intelligible to a certain spatio-temporally restricted group of persons, he is an historical linguist, as contrasted with a linguist who is interested in this dialect primarily as a sample of the kinds of language which human societies can develop: the attempt to characterise this set of possible languages is often called the search for linguistic universals.
Thus a richer terminology would be required for a precise description of hybrid historical and interpretative aims. This is not relevant to our present concerns and will not be pursued further.
Like the interpretative aim, the "historical" aim of finding out about the contents of particular bits of the world must also be built into intelligent machines. Moreover, the pursuit of these two aims by a machine will interact, as in science.
For example, knowing that rain is possible and wanting to stay dry, one can take a waterproof covering whenever one goes out. More generally, one can take precautions to prevent the effects of an unwanted possibility, even if one cannot predict or prevent it.
Similarly, one can take steps to get the best out of possibilities one knows about but cannot predict or produce, like building tanks to catch water in case it rains, which might be worth doing even if one had no idea how often rain fell, provided one needed the water enough and had time and materials to spare.
The discovery of possibilities may have technological significance in less direct ways. Knowing that something is possible can provide a boost to research into an understanding of how and why, so that its occurrence may be predicted or brought about, or new variants produced. Knowledge that it was possible for things heavier than air to fly, namely birds, provoked research into ways of enabling men and machines to do so. That was a case of a possibility demonstrated by actual instances, then extended to a wider range of instances.
Sometimes a possibility is explained by a theory before instances are known, and this again can have great technological importance, as in the case of Einstein's discovery of the possibility of converting mass into kinetic energy, or the theoretical discovery of the possibility of lasers before they were made. Much of engineering design consists of demonstrating that some new phenomenon is possible and showing how, or that some possibility can be produced in new ways or in new conditions. An intelligent planning system may also need to be able to generate types of possibilities before instances are known actually to exist. This is commonplace in engineering design.
Formally this technological activity has much in common with the supposedly purer or more theoretical activity of inventing a new theory to explain some previously known possibility, or using the ideas of one science to explain possibilities observed in another, for instance using physics to explain chemical possibilities, and using chemistry to explain the very complicated possibility of sexual reproduction. (See J. Watson, 1968.) 'Pure' science first discovers instances of possibilities then creates explanations of those possibilities whereas 'applied' science uses explanations of possibilities to create instances. The kinds of creativity and modes of reasoning involved are often similar. More generally, any form of intelligent action requires an understanding of possibilities. One cannot change the world sensibly without first interpreting it, even though attempting to change things is often indispensable for correcting mistaken interpretations and deepening one's understanding. Acting intelligently in a situation requires a survey of possibilities, which requires an understanding of the potential for change in the situation. For example, opening a window requires a grasp of the possibilities for movement in the window and its catch. But this requires interpreting what is actual, i.e. relating it to general knowledge of what sorts of things are possible in what circumstances: so action requires knowledge of the form of the world. Grasping new possibilities often involves inventing new concepts, new languages in which to represent them, a topic discussed later.
Much more could be said about relations between the interpretative aims of science, and the historical and technological aims. Instead, let's take a closer look at some of the interpretative aims of science, the aims concerned with learning about and understanding possibilities. We shall attempt to clarify the similarities and differences between these aims, and then proceed to formulate criteria for assessing some of the achievements of scientists.
The three aims are very tightly interconnected. It is very hard to describe the distinctions between them accurately, and I am sure I do not yet understand these matters aright. Moreover, each of them could be further subdivided. Detailed historical analysis is required here, so that similarities and differences between cases can be described accurately and a more satisfactory typology developed: a contribution to the scientific study of science. Alas, this will require the help of persons more scholarly than I. Let's take a closer look at (a).
As new concepts and symbolisms are developed, and the language extended, new questions become askable. For instance, people who grasp the concepts 'hotter' and longer' can understand the question whether metal rods get longer when they are made hotter. And they may even be able to grasp crude distinctions between metals according to which grows longer faster when heated. But in order to learn to think about whether the change in length is proportional to the change in temperature, so that they can then use the constant of proportionality (divided by the length of the rod) to define a numerical 'coefficient of expansion' for each metal, they need to grasp numerical representation of differences in temperature and length ('hotter by how much?', longer by how much?').
Similarly, although people may have a crude grasp of distinctions between velocity and acceleration, and be able to detect gross changes in either, on the basis of their own experiences of moving things, being moved, and perceiving moving objects, nevertheless, until they have learnt how to relate concepts of distance and time to numerical interval scales, they cannot easily make precise distinctions between different velocities, or between acceleration and rate of change of acceleration, nor think of precise relations between these concepts. These familiar examples show the power of extending scientific language by introducing numerical concepts and notations corresponding to old non-numerical concepts. This sort of thing has been so important in physics that many have been deluded into thinking it part of the definition of a scientist that he uses numbers!
The replacement of Roman numerals with the Arabic system is an example of a powerful notational advance. Another was the Cartesian method of using arithmetic to represent geometry and vice versa. Both involved numbers.
All sorts of notations besides numerical and algebraic ones have played an important role in extending the abilities of scientists to express what they know and want to find out.
Pictures, diagrams, maps, models, graphs, flow charts, and computer programs, have all been used. Examples include: the diagrams used in the study of levers, pulleys, bending beams, and other mechanical systems; the 'pictures' of molecules used by chemists, for instance, in the following representation of the formation of water from hydrogen and oxygen
(H-H, H-H, 0=0) -----> (H-O-H, H-O-H)
circuit diagrams used in electronics; optical drawings showing the paths of light rays; plates showing tracks of subatomic particles; and the 'trees' used by linguists to represent structures or sentences. I shall argue later that these non-verbal forms of representation play a part in valid reasoning, scientific and non-scientific, conscious and unconscious.
The same is true of an adult who cannot describe the features of musical compositions which enable him to recognise styles of composers and appreciate their music, or the cues which enable him to judge another's mood. Non-logicians can often distinguish valid from invalid arguments without being able to say how. They have not learnt the overt language of logicians.
No doubt this is true also of many scientists, especially when they are in the early phases of some kind of conceptual development. They may then, like children and chimpanzees, be unable to articulate fully the reasons they have for some of the decisions they take about interpreting evidence and assessing hypotheses.
Even after going a stage further and learning how to articulate their reasons, scientists may not yet have learned how to teach their new concepts to colleagues and rival theorists. So attempts at rational persuasion break down. This has misled some philosophers and historians of science (e.g. Kuhn) into thinking that there are no reasons, and inferring that the decisions of scientists are irrational or non-rational. This is as silly as assuming that a mathematician is irrational simply because he cannot explain a theorem to a four year old child. The child may have much to learn before he can understand the problem, let alone the reasoning, and the mathematician may be a poor teacher.
Concepts are not simple things which you either grasp or don't grasp, or which can be completely conveyed by an explicit definition or axiomatic characterisation. For instance, as work of Piaget has shown so clearly, and Wittgenstein less clearly, very many of our familiar concepts, like 'number', 'more', 'cause', 'moral' and language', are very complex structures of which different fragments may be grasped at different times. In a later chapter I shall illustrate this by analysing some of the complexities children master when they learn to count.
In this way one can learn to think about new sorts of possibilities without waiting to be confronted with them. (This kind of thing may also happen below the level of consciousness, in children and scientists, as part of the process of learning and discovery.) Of course, one may also extrapolate too far, and construct representations of things which are not really possible in the world, so empirical investigation of some sort is required to discover whether things which are conceivable or representable can also exist. For instance, merely analysing the concept of an element with atomic number 325 will not decide whether such a thing can occur. This is the reason for distinguishing the first aim of interpretative science, namely extending concepts and symbolisms, from the second aim, namely extending knowledge of what is really possible.
Usually philosophers plunge into discussions of such questions as whether we can know anything about the future, or rationally believe anything about the future, without first asking how a rational being can even think about the future or think about alternative possible future states of affairs. (Work in artificial intelligence is beginning to explore these problems.)
Philosophers are therefore attempting to assess the rationality of certain decisions on the basis of a drastically incomplete account of the resources that might enter into the decision-making process. The reason why a study of our ability to think of things has been shirked is partly because it is so hard to do, partly because of an unwarranted restriction of rationality to relations between evidence and belief-contents, and partly because many philosophers think that the investigation of conceptual mechanisms is a task for psychologists not philosophers. However, most psychologists never even think of the important questions, and those who do usually lack the techniques of conceptual analysis required for tackling them: so the job does not get done. (Piaget seems to be an exception.)
There is a need for a tremendous amount of research into what it is to understand various sorts of concepts, and what makes it possible. There is also a need for some kind of taxonomy of types of conceptual change, whether in individuals or in cultures.
Until these and other conceptual changes are better understood, discussion of 'incommensurability' of scientific theories and of the role of rationality in science is premature. Meanwhile education will continue to be largely a hit and miss affair, with teachers not knowing what they are doing or how it works, When we really can model conceptual development, things will be very different.
To sum up so far. We have been discussing subgoal (a), namely developing new concepts and symbolisms making it possible to conceive of, think about and ask questions about new types of possibilities A system of concepts and symbols with procedures for using them constitutes a language. A language which is used to formulate one theory, will usually also contain resources for formulating alternatives, including the negation of the theory and versions of the theory in which some predicate, relational expression or numerical constant is replaced by another.
So concepts and symbols are tools for generating possibilities or questions for investigation. They have greater generative power than theories. The scientist who usefully extends the language of science, unlike one who simply proposes a new theory using existing concepts and symbols, extends the hypothesis-forming powers of the scientists who understand him. In this sense conceptual advances are more profound.
So the important differences between modern scientists and those of the distant past include not merely the statements and theories thought to be true or false, but also which statements and theories could be thought of at all. Not only are more answers known now, but more questions are intelligible. The same applies to development of an individual.
So science is served not only by extending and differentiating existing concepts: rejection of a concept or typology or mode of representation may also serve the aims of science by reducing the variety of dead-end questions and theories. Concepts, typologies, taxonomies, and symbolisms can, like theories, be rationally criticised, and rejected or modified. Any intelligent learning system will need to have procedures for rationally criticising its current conceptual and symbolic resources. (See Winston (1975) for a simple example of a computer program that modifies its own concepts.)
There are several ways in which a typology and associated notation can be rationally criticised. For instance one may be able to make one or more of these criticism:
(a) That there are some possibilities it doesn't allow for,
(b) That it represents as possible some cases which are not really possible,
(c) That some of the subdivisions it makes are of no theoretical importance,
(d) That some category within it should be subdivided into two or more categories, because their instances have different relations to the other categories,
(e) That a principle of subdivision fails to decide all known cases, e.g. because of inapplicable tests,
(f) That the classification procedure generates inconsistent classifications for some instances,
(g) That the notation used does not adequately reflect the structural properties of the typology, or of the instances, e.g. when people use diagrams with bogus detail,
(h) That the concepts used generate questions which apparently cannot be answered by empirical investigation (like the question 'How fast is the Earth moving through the aether?'),
(i) That more powerful explanatory theories can be developed using other tools for representing possibilities.
I suspect that some or all of these criteria are used, unconsciously of course, not only by scientists, but also by young children in developing their conceptual systems. They could also play an important role in an intelligent learning machine.
Several of these criteria will remain rather obscure until later. In particular, the first two can only be understood on the basis of a distinction between what is conceivable or representable and what is really possible in the world. We now examine this, in order to explain the difference between the first two interpretative subgoals of science, namely (a) extending what is conceivable or representable and (b) extending knowledge of what is really possible.
This aim is indefinitely extensible: having found out that X's can exist or occur, one can then try to find out whether X's can exist or occur in specified conditions C1, C2, C3, .... Similarly, having found that objects can have one range of properties which can change (e.g. length) and can also have another range of properties which can change (e.g. temperature) one can then try to find out whether these properties can change independently of each other in the same object, such as a bar of metal, or a particular object in specified circumstances, such as a bar of metal under constant pressure or tension. Such further exploration of the limits of combinations of known possibilities merges into the search for laws and regularities, as explained previously.
We can conceive of, or describe, a lump of wood turning spontaneously into gold, or a human living unclothed in a vacuum, but it does not follow that these things really can exist. What is the difference? First we look at what it is for something to be conceivable, representable, or describable.
A sentence, phrase, picture, diagram, or other complex symbol will, if intelligible, be part of a language which includes syntactic and semantic rules in accordance with which the symbol is to be interpreted. The mere fact that the symbol is syntactically well-formed does not guarantee that it can be interpreted, though it may mislead us into thinking it can. More precisely, it may have a sense but necessarily fail to have any denotation. Thus the question 'Does the table exist more slowly than the chair?' is syntactically perfect but we can show that so long as the words are used according to normal semantic rules there can be no answer to the question. For, 'more slowly' when qualifying a verb requires that verb to denote a process or sequence involving changes other than the change of time, so that the rate of change or succession can be measured against time. Existence is not such a process, so rates of existence cannot be compared. (For more on the connection between sense and failure of reference see Sloman (1971b).)
We can use the notion of what is or is not coherently describable or representable in some well defined language or representational system, as an objective semantic notion. What is conceivable to a person, will be what is coherently representable in some symbolic system which he uses, not necessarily fully consciously. It may be very hard, even for him, to articulate the system he uses, but that does not disprove its existence. These notions are as objective as the notion of logical consistency, which is a special case.
However the mere fact that something is, in this sense, representable or conceivable does not mean that it really can exist. Conversely, what can exist need not be representable or conceivable using the symbolic resources available to scientists (or others) at any particular time: their language may need to be extended. Scientists (like children) may be confronted with an instance of some possibility, like inertial motion, diffraction, or curvature of space-time, without seeing it as such because they lack the concepts. (Kuhn, 1962, chapter X, has over-dramatised this by saying they inhabit a different world.)
The word 'possible' as I have used it, and as others use it, tends to slide between the two cases (a) used as a synonym for 'consistently representable or describable using some representational system', as in logically possible', and (b) used to refer to what can occur or exist in the world. This is why the first two interpretative aims of science are not always clearly distinguished. But what is the difference between (a) and (b)?
This is not an easy question to answer. The main difference is that conceivability or representability can be established simply by analysing the sentence or other symbol used and checking that the syntactic and semantic rules of the language in question do not rule out a consistent interpretation (which is not always easy), whereas checking whether something really is or is not possible requires empirical investigation of some sort. The former involves conceptual analysis (see chapter 4), the latter perception, experiments or surveys.
However, possibility is not the same as actual existence. To say that it is possible for ten drugged alligators to be painted with red and yellow stripes and then piled into my bath is not to say that this ever has happened or will happen. Similarly, to say that several courses of action are possible for me, is not to say that I shall actually follow all of them. So, in saying that one of the aims of interpretative science is to find out which kinds of things are possible in the world, I do not mean that the aim is to find out which kinds actually exist, as in historical science. The latter is just a means to the former.
What other means are there of deciding that something is really possible, besides finding an instance? Alas, the only answer I can give to this is that we can reasonably, though only tentatively, infer that something is possible if we have an explanation of its possibility. What this amounts to is roughly the following: (a) we can consistently represent it using symbolic resources which have already been shown to be useful in representing what is actual, and (b) it is not ruled out by any well established law or theory specifying limitations on possibilities.
It is clear that these conditions do not conclusively prove something to be possible, for they rest on current theories of the limitations of what is possible and such theories, being empirical, are bound to include errors and omissions, at any stage in the advance of science. Further, these conditions do not yield clear decisions in all cases. For instance, is it reasonable to believe that it is possible for a normal human being to be trained (perhaps starting from birth) to run a mile in three minutes? It may not be clear whether we already know enough to settle such a question.
But I still have not given anything approximating to a complete analysis: this would require very much more than describing the criteria for deciding whether something is possible or not. It would also require analysis of the role of the concept of possibility in our thinking, problem-solving, deliberating, regretting, blaming, praising, etc., and its relations to a whole family of modal words, such as 'may', 'can', 'might', 'could', 'would', etc. A mammoth task. (For some useful beginnings see Gibbs, 1970 and White, 1975.) A good analysis would be part of a design for a mind.
At any rate, we cannot analyse 'Things of type X are possible' as synonymous with 'Either things of type X already exist, or else they are consistently representable in our symbolic system without being ruled out by known laws', since this would define real possibility in terms of the current system of concepts and beliefs. We could try a formula like 'Things of type X are possible if and only if they either exist or are consistently representable in some useful representational system and are not ruled out by any true laws'. But this has the disadvantage of presupposing that there exists some complete set of true laws formulated in some unspecified language which correctly defines all the limitations on what is possible in the world. It is by no means clear that such a presupposition is intelligible. Moreover as a definition it introduces a circularity, since it is notoriously hard to define the concept of a law without presupposing the concept of possibility or some related concept.
Despite the remaining obscurities, I hope I have done enough to indicate both that the first two aims of interpretative science are different, and also that they are very closely related. Now for a closer look at the third aim the aim of explaining possibilities.
Roughly, an explanation of a possibility or range of possibilities can be defined to be some theory or system of representation which generates the possibility or set of possibilities, or representations or descriptions thereof. An explanation of a range of possibilities may be/a grammar for those possibilities. A computer program is a good illustration: it explains the possibility of the behaviours it can generate (which may depend on the environment in which it is executed). In this way Artificial Intelligence provides explanations of intelligent behaviour. There is much to be clarified in these formulations, but first some examples from the history of science.
Some of the theories listed so far not only explained possibilities, but also contained enough detail to make prediction, and in some cases control, possible. This is fairly common in physics, though more difficult in biology. In the case of the human sciences (and philosophy) the ability to predict and control is rare.
Another unexplained possibility is the evolution of animals with specific intelligent abilities (like the ability to learn to use tools, or to learn to use language) from species lacking these abilities, and in particular the evolution of human beings.
In the case of human psychology, there are very many possibilities taken for granted as part of common sense, yet still without even fragmentary explanations, for instance the possibility of a newborn infant learning whatever human producing a work of art, the possibility of extending an art form or language, the possibility of using knowledge acquired in one context to solve a problem of a quite different sort, the possibility of relating one's actions to tastes, preferences, principles, hopes, fears, knowledge, abilities, and social commitments, and the possibility of changing one's moral attitudes through personal experience.
There are missing explanations of possibilities in physics and chemistry also. As far as I know, the possibility of mechanical utilisation of fuel energy at levels of efficiency achieved in animals is still not explained.
Such philosophers normally assume that both the theory and what it explains are expressed in the form of sentences, using natural language supplemented by the technical language of the science concerned. It is also assumed that the deduction is logical, that is the inference from theory to what it explains can be shown to be valid according to the rules of inference codified by logicians. (This is sometimes generalised to permit cases where the inference is only probabilistic.)
This concept of deduction and the related notion of explanation needs to be generalised in two ways. First of all, other means of representation besides sentences may be used, such as maps, diagrams, three-dimensional models or computer programs. Secondly, the forms of inference include not only the logical forms (like 'All A's are B's, All B's are C's. Therefore All A's are C's'), but also the manipulation of other representations. An example is the manipulation of diagrams representing molecular structures, in order to explain the possibility of chemical reactions, like the production of water from hydrogen and oxygen.
I shall explain in chapter 7 exactly what 'valid' means and why this generalisation to non-verbal forms of valid inference should be permitted. Just as the semantic rules of verbal languages guarantee that certain transformations of sentences preserve truth, so can semantic rules of non-verbal representations guarantee that certain manipulations preserve denotation. (This generalisation of the concept of a valid inference is central to the analysis of the elusive concepts of 'cause' and 'mechanistic explanation' but that is another story.)
Typical examples of such non-verbal inference methods are: the use of Venn diagrams in set theory, the 'parallelogram' representation of addition of forces, velocities and other vectors, the use of circuit-diagrams in electronics, the use of a map to select a route, the use of a diagram to show how a machine works. On this view the use of models and so-called 'analogies' in science is simply a change of language: one configuration is used to represent another. All the usual talk about isomorphism of models in this context is as misconceived as the theory that sentences in natural language must be isomorphic with things they describe: there are many more kinds of non-verbal representations than isomorphic models. (See Goodman, 1968, Clowes, 1971, and Toulmin, 1953). I was helped to see all this by an unpublished paper by Max Clowes, called 'Paradigms and syntactic models'.)
We now have a minimal requirement for a theory T
formulated in sentences or other symbolic apparatus to be an
explanation of some range of possibilities, namely:
An illustration of this is the use of the theory of bonds between atoms (the theory of valencies) to explain the possibility of a very large number of chemical compounds and transformations. Knowing the kinds of bonds into which the various atoms can enter, one can generate representations of large numbers of chemical compounds, and chemical reactions, using diagrams or models of molecular structures. Here one range of (relatively primitive) possibilities is used to explain another range.
This simple chemical theory had to be revised and refined of course, but that does not affect the point that at least part of its scientific function while it survived was to explain a range of possibilities according to criterion (1). (In AI research, a program can explain a range of possible behaviours. A derivation consists of running the program, or, preferably, reasoning about the program's capabilities.)
So a good explanation of a range of possibilities should be definite, general (but not too general), able to explain fine structure, non-circular, rigorous, plausible, economical, rich in heuristic power, and extendable.
A theory explaining a range of possibilities may be criticised by showing that it explains too much, including things which so far appear to be impossible. The theory may not explain enough of the known fine structure of the possibilities (like theories of speech understanding which do not explain how hearers can cope with complex syntactic ambiguities, or developmental theories in biology which don't explain how a chicken's egg can grow into something like its mother or father in so many detailed ways).
The explanation may be circular, like theories which attempt to explain human mental functioning by assuming the existence of a spirit or soul with essentially all the abilities it is intended to explain.
The theory may be so indefinite that it is not clear what it does and what it does not explain.
A theory may also be criticised less directly by criticising the specification of the range of possibilities which it is meant to explain (e.g. criticising the typology on.which it is based). For instance the specification may describe a set of structures in ways which are not related to their functions, like describing sentences in terms of transition probabilities between successive words.
Or the set of possibilities explained may be shown to be only a sub-range of some wider set of possibilities which the theory cannot cope with. For instance, a theory which explains how statements are constructed and understood can be criticised if it cannot be extended to account for questions, commands, threats, requests, promises, bets, contracts, and other types of verbal communication which are clearly functionally related to statements in that they use related syntactic structures and almost the same vocabulary.
If it turns out that a physical theory of the interactions of atoms and their components can only explain the possibility of chemical reactions involving relatively simple molecules, then that will show an inadequacy in the theory.
Similarly, if an economic theory can explain only the possibility of economic processes occurring when there is a very restricted amount of information flow in a community, then that theory is not good enough.
Finally, if a philosophical theory of the function of moral language accounts only for abusive and exhortative uses of that kind of language, then it is clearly inadequate since moral language can be used in a much wider range of ways.
In some cases, whether a theory explaining some specified range of possibilities satisfies these criteria or not, or whether it satisfies them better than a rival theory, is not an empirical question. It is a question to be settled by conceptual, logical and mathematical investigations of the structure of the theory and of what can be derived from it.
Sometimes the theory is too complex for its properties to be exhaustively surveyed. If so, one can only try out various derivations or manipulations in test cases. This is partly analogous to an empirical investigation in that the results are always partial and cannot be worked out in advance by normal human reasoning. Similarly testing a complex computer program may feel like conducting some kind of experiment. Nevertheless, as already remarked, the connections so discovered are not empirical, but logical or mathematical in nature. (Compare Pylyshyn 1978, Sloman 1978.)
These criteria for assessing explanations of possibilities could be justified by showing how their use contributes to the interpretative and practical aims of science. They would also have to play a role in the design of an intelligent learning machine, along with the previously listed criteria for assessing concepts and symbolisms. So these criteria are relevant to developmental psychology and AI, as well as to the methodology of the physical sciences.
Although it explains how certain sorts of phenomena are possible, the underlying mechanism or structure postulated may, at the time the theory is proposed, be unobservable, so that observation of its state cannot be used to predict actual occurrences of those phenomena. Similarly, no techniques may be available for manipulating the mechanisms, so that the theory provides no basis for controlling the phenomena.
For instance, the theory of evolution explains the possibility of a wide range of biological developments without providing a basis for predicting or controlling most of them.
Similarly, a theory explaining the possibility of my uttering sentences of particular forms need not provide any basis for predicting when I will utter any one sentence, or for making me utter it, or even for explaining exactly why I uttered the particular sentence I did utter at a particular time. This is because the theory may simply postulate a certain kind of sentence-generating mechanism, available in my mind as a resource to be used along with other resources. How any particular resource is used on any particular occasion, may be the result of myriad complex interactions between such factors as my purposes, preferences, hopes, fears and moral principles, what I believe to be the case at the time, what I know about the likely effects of various actions, how much I am distracted and so on. The theory which explains the possibility of generating and understanding sentences need not specify all the interactions between the postulated mechanism and other aspects of the mind. So it need not provide a basis for prediction and control.
This is true of any explanation of an ability, skill, talent, or power, in terms of a mechanism (e.g. a computer program) making it possible. The explanation need not specify the rest of the system of which that resource is a part, nor specify the conditions under which the resource is activated. And even if it does, the specification need not refer to either observable conditions or manipulable conditions. So such explanations of possibilities, though they contribute to scientific understanding, need not contribute to predictions of actual events.
I believe that the stress on predictive content derives from a misunderstanding of criteria 2 and 4, namely the requirement that the theory be definite and capable of explaining
Lack of predictive power, practical utility, or refutability need not rule out rational discussion of the scientific merits of an explanation of a range of possibilities. Neither should it rule out rational comparison with rival explanations, in accordance with the criteria listed above. Nor does it prevent such a theory from giving deep insight, of a kind which provides a firm basis for building more elaborate theories which do permit predictions and explanations of particular events, and which are empirically refutable.
I therefore see no reason for calling such theories nonsensical, as some of the logical positivists would, nor for banishing them from the realm of science into metaphysics or pseudo-science, as Popper does, (though he admits that metaphysical theories may be rationally discussable and may be a useful stimulus to the development of what he calls scientific theories).
I am not here arguing over questions of meaning: I am not arguing about the definition of 'science'. My point is that among the major merits of the generally agreed most profound scientific theories is the fact that they satisfy the criteria for being good explanations of possibilities, and therefore give us good insights into the nature of the kinds of objects, events or processes that can exist or occur in the universe.
If unrefutable theories are to be dubbed 'metaphysical', then what I am saying is that even important scientific theories have a metaphysical component, and that the precision, generality, fine structure, non-circularity, rigour, plausibility, economy and heuristic power are among the objective criteria by which scientific and metaphysical theories are in fact often assessed (and should be assessed).
The development of such 'metaphysical' theories is so intimately bound up with the development of science that to insist on a demarcation is to make a trivial semantic point, of limited theoretical interest. Moreover, it has bad effects on the training of scientists. Since Artificial Intelligence produces unfalsifiable, but rationally criticisable, theories, it should undermine this harmful trend.
Observing an actual instance of a possibility explained by some theory provides support for that theory at least to the extent of showing that there is something for it to explain: it shows that the theory performs a scientific function. However, the support adds to previous knowledge only if it is a new kind of possibility. Mere repetition of observations or experiments does not increase support for a theory: it merely checks that no errors were made in previous instances.
In these contexts all the normal stress on repeatability of scientific experiments is unnecessary and has misled some psychologists and social scientists into making impossible demands of empirical studies of man and society. Repetition may be a useful check on whether the phenomenon really is possible (since it permits more independent witnesses to observe it), and it provides opportunities for more detailed examination of exactly what occurred, but is not logically necessary.
Beethoven's compositions are unique. Yet it is a fact that it was possible for a human being to create them. That possibility requires explanation.
If a phenomenon occurs only once, then it is possible; and its possibility needs explaining. Any explanation of that possibility is therefore not gratuitous, and the only question that should then arise is not whether the explanation is science or pseudo-science, or metaphysics, but whether it is the correct explanation. In practice, this becomes the question whether a better explanation can be found for the same possibility, that is, an explanation meeting more of the criteria (2) to (9) above; or perhaps serving additional scientific aims besides explaining possibilities.
The frantic pursuit of repeatability and statistically significant correlations is based on a belief that science is a search for laws. This can blind scientists to the need for careful description and analysis of what can occur, and for the explanation of its possibility.
Instead they try to find what always occurs -- a much harder task -- and usually fail. Even if something is actually done by very few persons, or only by one, that still shows that it is possible for a human being, and this possibility needs explanation as much as any other established fact. This justifies elaborate and detailed investigation and analysis of particular cases: a task often shirked because only laws and significant correlations are thought fit to be published. Social scientists have much to learn from historians and students of literature despite all the faults of the latter.
I have gone on at such great length about describing and explaining possibilities because the matter is not generally discussed in books on philosophy of science, or in courses for budding scientists. But I do not wish to deny the importance of trying to construct theories which can be used to explain and predict what actually occurs, or which explain impossibilities (laws) and observed regularities. Of two theories explaining the same range of possibilities, one which also explains more impossibilities and permits a wider variety of predictions and explanations of actual events to be made on the basis of observation, is to be preferred, since it serves to a greater degree the aims of science listed previously.
This discussion is still very sketchy and unsatisfactory. Much finer description and classification of different sorts of explanations is required. But enough for now!
I suggest that anyone who tries this will discover, possibly to his surprise, that the scientific advances which he regards as most important include not only discoveries of new laws or regularities, or explanations thereof, but also discoveries of new types of phenomena, new explanations of ranges of possibilities, new concepts, new notations, and therein new means of asking questions about the world. For example, Boyle's discovery of his law relating pressure and volume of a gas, was not so profound as the prior invention of the concepts of pressure and volume. The search for laws presupposes the search for possibilities and their explanations, and this requires concepts and notations for representing possibilities.
For reasons which I do not fully understand, Popper is apparently strongly opposed to all this talk of concepts and possibilities. (See, for instance, pp. 123-4 of his (1972) where he describes it as an error to think that concepts and conceptual systems or problems about meaning are comparable in importance to theories and theoretical systems, or to problems of truth.) As far as I can tell, his argument rests on the curious assumption that concepts or meanings are purely subjective things, and that only complete statements containing them can be assessed or criticised according to objective criteria. I hope I have said enough to refute this.
Roughly, our disagreement seems to hinge on Popper's view that the only place for rationality in science is in the selection from among hypotheses expressible in a given language, whereas I have tried to show that there are rational ways of deciding how to extend a language, and therefore how to extend the set of expressible hypotheses. I admit that there are still serious gaps in my discussion: a theory of concept-formation is still lacking. (See 2008 note on "Ontology Extension" below.)
Finally, even if it is agreed that science uses rational means to pursue the aims described here, the question arises: are these aims rational? Is it rational to pursue them? I believe there is no answer to this. If someone genuinely prefers the life of a mystic or hermit or 'primitive' tribesman to the pursuit of knowledge and understanding of the universe, then that preference must be respected. However, I believe that the aims and criteria described here are part of the mental mechanism with which every human child is born but for which it would not be possible to learn all that human children do learn. So one can reject science only after one has used it, however unconsciously, for some years.
Similarly, rational processes of concept formation and theory construction will have to be built into an intelligent robot if it is to be capable of matching the learning ability of young children. The development of science, the learning of a child, and the mechanisms necessary for an intelligent robot all involve computational processes, which build up and deploy knowledge of the form and contents of the world. This is one of several points at which bridges can be built between philosophy of science, developmental psychology, and artificial intelligence.
The attempt to build these bridges will provide good tests for the philosophical theories outlined here. It is certain that my theories will prove inadequate. But I hope they may provide a useful basis for further research.
In the first kind, a new concept, predicate, function, or logical operator is defined explicitly in terms of previously used concepts, etc. Thus nothing new can be thought or expressed as a result of the extension, though some things may be expressed or thought more concisely. Definitional ontology extension introduces only abbreviations for concepts and forms of expression that existed previously.
In substantive ontology extension, something new is introduced that is not definable in terms of what was previously understood. According to concept empiricism (defended by the British Empiricist Philosophers, and criticised by Immanuel Kant in his Critique of Pure Reason (1781)), the only way to acquire such a novel concept is to derive it (somehow) from experience of instances, e.g. experiencing a new colour, or taste, or smell. Approximately the same claim has been central to "Symbol Grounding" theory, introduced by Stevan Harnad (over a decade after this book had been published, apparently in ignorance of Kant's refutation of the theory, and its rejection by modern philosophers of science).
The Symbol Grounding Problem, Physica D, 42, 1990, pp. 335--346,However, the advance of science shows that it is possible to introduce new theoretical concepts that are neither abstracted from experience of instances (e.g. because instances cannot be experienced) nor defined in terms of previous concepts. E.g. this happened with concepts like proton, electron, charge, valence, chemical bond, magnetic field, gene, and others. The failure of logical empiricists to explain such conceptual innovations in terms compatible with concept empiricism led to new ideas about concepts that are implicitly defined by the theories that use them. For more on this see these presentations
 This is because the definition of the set entails that it contains itself if and only if it does not contain itself. (Note added: 2001. See also A. Botterell 'Conceiving what is not there', Journal of Consciousness Studies vol 8, no 8, pp 21--42, 2001.)
Original pages 63-83
So Kant's work illustrates the overlap between science and philosophy. There are many more examples. Einstein's approach to the analysis of concepts of space and time was influenced by his reading of empiricist philosophy. Frege's attempts to answer some of Kant's questions about the nature of arithmetical knowledge led him into logical and semantic theories and formalisms which have deeply influenced work in linguistics and computer science. Marx's sociological theories were partly based on Hegel's philosophy. More recently, work by philosophers of language, such as Austin and Grice, has been taken up and developed by linguists, and the psychologist Heider has acknowledged the influence of Ryle's The Concept of Mind.
Philosophers' analyses of some of our most general concepts, such as cause, individual, action, purpose, event, process, good, and true, are relevant to biology, to anthropology and developmental psychology, whether or not practitioners of these subjects are aware of this.
For instance, biologists studying the evolution of intelligence need to grasp what intelligence is, and how it includes the use of some or all of these concepts. A comprehensive anthropology would include cross-cultural studies of the most general and basic systems of concepts used by different peoples. And if developmental psychologists were to do their job properly they would spend a lot of time exploring such concepts in order to be able to ask deep questions about what children learn and how. (Piaget did this, to some extent. But I am not aware of university courses in developmental psychology which include training in conceptual analysis.)
Within artificial intelligence it is not possible to avoid philosophical analysis of such concepts, for the discipline of trying to design machines which actually behave intelligently and can communicate with us forces one into analysis of the preconditions of intelligent behaviour and our shared presuppositions. For otherwise the machines don't work!
These illustrations of the connections between philosophy and the scientific study of the world are not isolated exceptions. Rather, they are consequences of the fact that the aims and methods of philosophy overlap with those of science. In this chapter I shall try to analyse the extent of that overlap.
A fourth major aim that they appear to have in common is the aim of discovering limits to what is possible, and explaining such limits. However, in relation to this aim, the methods of scientists and philosophers tend to be rather different, insofar as philosophers often try to set up non-empirical demonstrations. And they usually fail.
By exploring the relationship between the aims and methods of science and philosophy we shall explain how it is possible for philosophy to be the mother of science, thereby perhaps making a philosophical contribution to the science of intellectual history.
Let us start with some reminders of the kinds of questions which have exercised philosophers. I shall ignore the many pseudo-questions posed by incompetent philosophers who cannot tell the difference between profundity and obscurity.
Many of the questions in the list have controversial presuppositions: it is often disputable whether the X in 'How is X possible?' is possible at all! Many attempts have been made to prove the impossibility of some X, for instance where X = meaningful talk about God or infinite sets, or rational discussion of moral issues, or even such obviously possible things as: change, a man over-taking a tortoise in a race, knowledge about the past, knowledge about material objects, or deliberation and choice.
Lunatic though it may at first appear, serious thinkers have put forward demonstrations that these are impossible. Equally serious thinkers have put great intellectual effort into attempts to refute such demonstrations. The process may appear a waste of time, but has in fact been very important. The discovery, analysis and, in some cases, refutation of such paradoxical proofs of impossibility has been a major, though haphazard, stimulus to philosophical progress and the growth of human self consciousness. It leads to a deeper understanding of the phenomenon whose possibility is in dispute. In some cases (e.g. Zeno's paradoxes) it has even led to advances in mathematics.
Often, a philosopher asks 'How is X possible?' only in the context of asking 'What is the flaw in so and so's alleged proof that X is impossible?' But there is also a more constructive philosophical tradition, first consciously acknowledged by Immanuel Kant, of granting that X is possible and attempting to explain how it is, in the light of careful analysis of what X is. This is the philosophical activity which merges into scientific theorising.
In what follows I'll try to analyse the similarities and differences in aims and methods: a step towards a scientific theory explaining the possibility of the growth of scientific and philosophical knowledge.
I do not mean that all the possibilities discussed by philosophers are obvious: they may be known to all of us without our realising that we know them (like the possibility of truly unselfish action). Some of the things we know are not evident to us until we have engaged in the philosophical activity of digging up unacknowledged presuppositions. For instance, most people if simply asked how many different kinds of uses of language there are, are likely to come up with only three or four, such as the text-book favourites: exclamations, questions, commands and assertions (statements). But even though they do not think of more without prodding, they do in fact know of many possible uses of language not covered by this list, such as betting, congratulating, pleading, exhorting, warning, threatening, promising, consoling, reciting, calling someone, naming someone or something, welcoming, vowing, counting, challenging, apologising, teasing, declaring a meeting open or closed, and several more. (See J.L. Austin, How to Do Things With Words.)
Similarly, there are many psychological possibilities which we all know about, but do not find it easy to recall and characterise accurately when theorising about the mind. I shall draw attention to many examples in later chapters. So, both philosophy and science use specialised techniques to find out what sorts of things are possible, but their techniques and consequently the ranges of possibilities unearthed, are different. Philosophers dig up what we all know, whereas scientists mainly to extend what we know, about possibilities.
In both cases a preliminary characterisation of a kind of possibility is subject to correction, in the light of an explanatory theory.
One of the faults of philosophers is that they tend to ask questions which are not nearly specific enough. If one simply asks 'How is knowledge possible?' or 'How is knowledge of other minds possible?', these questions do not explicitly specify the requirements to be met by explanatory answers, since they do not describe in sufficient detail what is to be explained. They specify many requirements implicitly, because we all know a great deal about the possibilities referred to, but until they have been described explicitly, people can unwittingly select different subsets for consideration, and so philosophical debates often go on endlessly and fruitlessly.
The criteria listed in Chapter 2 for assessing explanations of possibilities, presuppose that there are detailed specifications of the range of possibilities to be explained. Otherwise there is no agreed basis for assessing and comparing rival theories. This preliminary analysis of the range of possibilities to be explained is often shirked by philosophers.
Even when philosophers do a fairly deep analysis, it is not presented in a systematic and organised form but rather in the form used for literary essays. The result is that philosophers often simply talk past each other. (This also happens in psychology for similar reasons, as may be confirmed by looking at the cursory 'definitions' of mental concepts such as emotion, memory .perception, learning, etc., which precede lengthy chapters on empirical results and proposed theories.)
In both philosophy and science, if progress is to be made, and seen to be made, the task of constructing an explanation of the possibility of X must be preceded by at least a preliminary characterisation of the range of possible kinds of X. This preliminary characterisation may be based on close examination of a wide variety of examples of X, taken from common experience, in the case of philosophy, or from specialised experiment and observation. The specification may include such things as the types of components, the types of organisation of those components, the types of behaviour, the types of functions, and the types of relations to other things, found in specimens of X, i.e. internal and external structures, functions and relations. In both philosophy and science, the construction of an explanatory theory will suggest ways of improving or correcting such 'observations'.
Having got a preliminary characterisation, that is, a preliminary answer to the question: What sort of things are X's? or What sort of X's are possible?, the scientist or philosopher can then begin to construct a theory describing or representing conditions sufficient to generate the possibility of instances of X (knowledge, perception, truth, scientific progress, change, falling objects, chemical processes, or whatever it is whose possibility is to be explained). Whether one is a philosopher or a scientist, the conditions for adequacy of an explanatory theory, and the criteria for comparing the merits of rival explanations of a range of possibilities are the same, namely the sorts of criteria listed in chapter 2.
Despite the overlap, there is an important difference. Often philosophers are content to find some theoretically adequate explanation of a set of possibilities without bothering too much whether it is the correct explanation. So they ask 'How might X be possible?' rather than 'How is X possible?', or 'What could explain the possibility of X?' rather than 'What does explain the possibility of X?' However, every answer to the latter necessarily includes an answer to the former, and in that way science subsumes philosophy, which is very like the relationship between A.I. and psychology (see chapter I ). In practice, the difference between the two approaches becomes significant only when alternative answers to the first question have been formulated, so that something can be done to find out which is a better answer to the second.
However, some philosophers have not been satisfied with this, and have tried to show that no other theory besides their own could possibly give the correct explanation. An argument purporting to show that T is not merely sufficient to explain the possibility of X, but also necessary, is called a ' transcendental argument'. (As far as I know, this notion was invented first by Kant.)
No attempts to construct valid transcendental arguments have so far been successful. For instance, Kant tried to show (in Critique of Pure Reason) that explaining the possibility of distinguishing the objective time order of events from the order in which they are experienced must necessarily involve assuming that every event has a cause; but quantum physics shows that one can get along without assuming that every event has a cause. Strawson tried to show (in Individuals) that our ability to identify and re-identify material objects and persons was a necessary part of any explanation of the possibility of identifying other things such as events, processes, states of affairs, pains, decisions, and other mental phenomena.
But he made no attempt to survey all the possible theories which might one day be formulated, including the varieties of ways in which computers or robots (and therefore people) might be programmed to use language, and his arguments seem to be irrelevant to the detailed problems of designing mechanisms with the ability to refer to and talk about things. (This criticism requires further elaboration.)
Such attempts at transcendental deductions are over-ambitious, for to prove that some theory T is a necessary part of any explanation of the possibility of X would require some kind of survey of all possible relevant theories, including those using concepts, notations and inference procedures not yet developed. It is hard to imagine how anyone could achieve this, in science or in philosophy. Scientists rarely try: They are not as rash as philosophers.
One reason why philosophers feel they must bolster up their explanations with 'transcendental arguments' is that they dare not admit that philosophy can be concerned with empirically testable theories, so they try to show that their theories are immune from empirical criticism. However, I shall show below that this is inconsistent with the practice of philosophers.
We now look a little more closely at similarities and differences between methods of science and philosophy.
The relevant philosophical procedures concern the following:
A first step is collecting information about the range of possibilities to be explained. For instance, before attempting to explain the possibility of knowledge one must ask 'What is knowledge?'. This involves collecting examples of familiar kinds of knowledge, and classifying them in some way. (Knowledge of particular facts, knowledge of generalisations, knowledge of individuals, knowing how to do things, etc.) Closely related possibilities should also be surveyed, e.g. believing, learning, inferring, proving, forgetting, remembering, understanding, doubting, wondering whether, guessing, etc. Functions of knowledge can then be listed and classified.
All this gives a preliminary specification of some of the fine structure of the range of possibilities to be explained, an answer to the question 'What is X?' (or, 'What are X's?). One can go on indefinitely attempting to improve on the preliminary specification, by covering a wider range of cases, giving more detailed specifications of each, and revising the classification.
This process may at first rely only on what Wittgenstein (in Philosophical Investigations, Part I, section 127) called 'assembling reminders'. These are examples of possibilities which when stated are obvious to common-sense, since we have all experienced similar cases, though we may not find them easy to think of on demand, like the examples of possible uses of language noted above. Much analytical philosophy, and most of Wittgenstein's later philosophy, consists of this kind of common-sense natural history.
An obvious extension of this activity is the use of experiments, instruments, measurement, fieldwork, and other tools of science to find and describe new examples of X, or new facts about old examples. Chapter 1 explained how artificial intelligence can contribute to this fact-gathering process in philosophy by providing examples of new forms of behaviour.
So the fact-collecting of philosophers merges into the fact-collecting of scientists. However, new empirical research may be premature if common sense knowledge about possible sorts of X's has not yet been made explicit and systematised. (Hence the futility of much psychological research, e.g. on decisions, learning and emotions.) So philosophical methods of analysis should come first in cases where relevant information is part of common sense for instance in the study of mind and society. (Some linguists have appreciated this, but few psychologists or social scientists. Fritz Heider was a notable exception: see his Psychology of Interpersonal Relations.)
In philosophy, as in science, fact collection is rarely useful unless guided by a problem or explanatory theory. The mere collection of possibilities is of little interest except insofar as a theory can be found to explain and organise them. And theories are important only if they help us solve problems or puzzles. How theories are generated is still largely an unsolved problem. No doubt chance plays a role, but individuals like Kant, Einstein and Newton would not have made so many theoretical advances if they had not employed (albeit unconsciously) rational procedures for making the best of chances which came their way.
Artificial Intelligence in its attempts to design intelligent (i.e.rational?) learning planning and problem-solving systems necessarily overlaps with philosophical attempts to explain the nature of theories and theory formation (as outlined in Chapter 2).
Emotivism is a philosophical theory purporting to explain how it is possible to use moral language meaningfully. However, fact-collecting of the sort described above showed the theory to be insufficiently general, for it was unable to account for facts about moral language which were not at first obvious to proponents of the theory, but are part of common sense. For instance, the theory interpreted moral language as performing functions like expressing the speaker's emotions, evoking similar emotions in hearers and causing hearers to act in certain ways. This fails to account for the empirically established possibility of unemotional hypothetical discussion among rational people of what, morally, ought to be done in certain situations. So the theory must either be rejected or modified to deal with this use of moral language. (I have listed a range of facts which theories like emotivism cannot account for, and proposed an alternative theory, in my two papers on 'better': see bibliography.)
This example refutes the widespread assumption that philosophical theories are not empirically testable. The assumption is probably based on a misconstrual of what philosophers actually do when they use empirical facts to test or support their theories: they use widely known common sense possibilities, rather than facts based on specialised empirical investigation. So the work can be done in an armchair no laboratory is needed, nor fieldwork. (The situation is similar when a linguist investigates his own language.) Because the information is so readily available its empirical nature is not recognised. (R.M. Hare made related points in his 'Philosophical Discoveries').
However, when the stock of relevant possibilities available to common sense is exhausted and has to be extended by more specialised empirical investigations, then philosophy merges into science. For instance philosophical investigations of the function of moral language and attempts to explain its possibility should, if properly conducted, overlap with linguistics and the psychology and sociology of morals. (Equally, the psychology and sociology, if done properly, would start with philosophical analysis of known possibilities.) For another example of philosophical use of empirical facts, this time from cognitive anthropology, see Bernard Harrison, Form and Content.
Similarly, philosophers have often tried to explain the possibility of knowledge on the assumption that it is possible for things to be learnt from experience, and in particular that it is possible for ideas to become 'associated' with one another. But these assumed possibilities also need explaining, and this leads directly into scientific studies (in artificial intelligence and psychology) of ways in which information can be acquired and stored so as to be available for future use, and so as to enable one piece of information to 'evoke' another (which involves tricky problems of indexing and retrieval).
In this way philosophy sometimes becomes more mathematical, as can be seen especially in the case of logic but also in philosophical studies of probability, in philosophy of science, and even in some branches of moral philosophy. Increasingly the formalisms of Artificial Intelligence will be used, as philosophical theories become more complex and precise, and too intricate to be evaluated without the aid of a computer. This parallels the ways in which scientific theories become more and more mathematical.
For instance, if, instead of the usual vague and general philosophical discussions of how perception can yield knowledge, an explanation is required of how specific sorts of perceptual experiences can yield knowledge of specific sorts of spatial structures, for instance an explanation of how certain views of a cube enable one to see that it is a cube with an interior and with faces on the far side, etc., then a mathematical formulation is inevitable. (N.B. 'Mathematical' does not mean 'numerical' or 'quantitative'.)
University courses in philosophy will need substantial revision if the appropriate theory-building and theory-testing skills are to be taught.
My own attempt (in chapter 7) to replace crude distinctions between verbal and nonverbal symbolisms and reasoning processes with more precise distinctions is another example. My use of the concept of 'explaining how Xis possible' is another. Further examples will be found in the chapter on numbers (chapter 8).
New concepts can change our view of what it is that we are trying to explain, so that a new specification is given of the old possibilities. Similar processes in the history of science have been described by Kuhn (1962, pp. 129-134), such as the change in the boundary between the concepts 'chemical compound' and 'physical mixture' resulting from the atomic theory of chemical composition.
In philosophy and in science, conceptual changes generate new specifications of what needs to be explained, and so can lead to new theories. The process of growth of human knowledge seems to be full of 'feed back' loops.
Certainly some philosophers have tried to show not merely how things are but also how they must be or cannot be. Empiricists try to show that all significant knowledge must be based on sensory experience. Rationalists try to show that certain important kinds of knowledge cannot be empirical. Dualists try to show that there must be more than a material world if consciousness as we know it is possible. Logicists try to argue that mathematical concepts must be definable in terms of logic, if they are to have their normal uses. Moral or political philosophers often try to argue that their own moral or political principles must be accepted if morality or society is to be possible at all. Such theses are often based on attempts at 'transcendental arguments', which I have already criticised as over-ambitious, in the discussion of Kant, above.
Kant claimed to have unearthed various laws and principles which were part of the fundamental constitution of the human mind, so that all human thought and experience necessarily had to conform to them. However, such claims are very rash, in view of the fact that both biological and cultural evolution are known to be possible. We have already seen that thoughts that were impossible for ancient scientists are possible for modern scientists. The same contrast can be made between children and adults. This suggests that insofar as human minds have a 'form' limiting the nature of the world they experience, this form can vary from culture to culture and from time to time in one culture or even in one person, or robot.
The same is probably true of forms of language, society, morals, religion and science. If there are limits to this variation, they will have to be found by scientific investigations, not introspection or philosophical argument. The limits can hardly be studied before the mechanisms of individual and social development are understood, however. We must not try to fly before we can walk, even if we are philosophers.
However, there are many more mundane kinds of limits of possibility which philosophers characteristically attend to in their attempts to analyse familiar concepts. For instance, it is impossible for someone to be a spinster and married; it is impossible to admire someone for his honesty and breadth of knowledge yet never believe a word he says; it is impossible to be interested in botany yet never wish to look at or learn anything about plants; it is impossible to be intensely angry with someone yet not believe that person has done anything you dislike or disapprove of; it is impossible to drive a car with care and recklessly at the same time (though it is possible carefully to drive over a cliff, to commit suicide). These are not laws' limiting what is possible in the world. Rather, they express defining conditions, or logical consequences of defining conditions, for the use of our concepts. Kant called such propositions 'analytic'.
Making such 'definitional' necessities and impossibilities explicit is part of the task of analysing how our concepts work. This in turn is a useful means of drawing attention to the presuppositions we all make about what sorts of things are possible in the world, and about useful ways of sub-dividing these possibilities. Looking at such subtle differences as the difference between 'with care' and 'carefully' (which are different since they have different boundaries) we learn to articulate our implicit common-sense knowledge about possible configurations of human beliefs, motives, decisions and actions. This is a contribution of philosophy to psychology and AI. (See chapter 4 for more on this.)
The role of necessities and impossibilities in philosophy is a large topic, and I have by no means exhausted it. All I wanted to show here is that the scientific aim of discovering limits of what is possible in the world is not an aim philosophers can or should share unless they are prepared to go beyond philosophical argument.
However, it is important for philosophers to expose present limits of our conceptual and representational apparatus often as a first step towards overcoming those limits. I am trying to expose, and remove, limits of our normal ways of thinking about philosophy and science.
For instance, philosophers have tried to use theories of language to show the possibility of logical languages which in one respect or another (e.g. precision, clarity, economy of rules) improve on natural language, or social theories to demonstrate the possibility of improving on existing social structures, or epistemological theories to demonstrate the possibility of improving on prevailing standards of rigour in science or mathematics. Similarly there is a technological theme to this book, insofar as it uses a theory of the relation between philosophy and science in an attempt to show the possibility of new types of collaboration between philosophers and scientists who study man, or engineers who try to design intelligent machines.
It seems very unlikely that they will discover new laws with predictive content and explanations of those laws, apart from such trivial laws as are based on common sense, such as the law' that no normal person in our culture calmly invites a total stranger to chop his leg off! Some alleged laws are very likely to be culture-bound regularities, modifiable by training, propaganda, or economic pressures. Other apparent laws 'discovered' by empirical research are in fact just disguised tautologies, true by definition, for instance: 'Other things being equal, people tend to choose alternatives which they believe will bring about what they desire most'; or 'Persons are more likely to believe a statement if it is made by someone they respect, other things being equal'.
But the lack of substantial laws does not leave the human sciences without content, for there are many kinds of social and psychological phenomena whose possibility is well known and needs to be explained, even though the prediction and explanation of particular instances is out of the question, since it depends enormously on highly complex individual histories, decision-strategies, beliefs, interests, hopes, fears, ways of looking at things, and so on.
To turn to the search for probabilistic or statistical laws, when the hope of universal laws has been abandoned, as so often happens, is to reject the opportunity to study and interpret the rich structure of particular cases as a way of finding out what possibilities they exemplify.
Insofar as there are laws and regularities to be discerned among all the idiosyncracies of human behaviour, they can hardly be understood and explained before the possibilities they limit have been described and explained. Outside novels, there are so far few, if any, rich and systematic descriptions or explanations of human possibilities, so the human sciences will need to join forces with philosophy in the study of possibilities for some time yet.
When such theories are embedded in computers and shown by the behaviour of the computer actually to work, then this establishes that they do not rest on presuppositions of the type they are trying to explain. (However, at present, A.I. models explain only a very tiny fragment of what needs to be explained.)
It may turn out that the combination of skills and knowledge required to construct non-circular and rigorous explanations of any significant range of human possibilities cannot exist in any one scientist nor in any team of scientists, philosophers, and engineers, small enough to co-operate. Human possibilities may be too complex to be understood and explained by humans. But the time is not yet ripe for drawing this pessimistic conclusion, and even if it is true, that is no reason for not trying.
Computers cannot yet do these things in a way which compares with humans, and perhaps they never will. But computer programs provide the only currently available language for formulating rigorous and testable theories about such processes. And only with the aid of computers can we explore the power of really complex and intricate theories. (Part two of this book elaborates on the kind of complexity involved.) So I conclude that in order to make real advances in problem areas mentioned above, philosophers, like psychologists and linguists, will need to learn about developments in the design of computing systems, programming languages and artificial intelligence models, even if they do not write programs themselves.
The ('meta-level') concepts used for describing computing systems, programming languages, hardware and software architectures, etc. are as important as, or perhaps even more important than the concepts used in programming languages.
The attempt to design a mind is a very long term research enterprise. I expect that it will provide the best illustration of the overlap between science and philosophy.
Original pages 84-192
I have tried to suggest that, besides these uses, conceptual analysis has another important purpose, namely to find out things about people and the world. However, this requires a far more disciplined and systematic approach to the analysis of concepts than is to be found in the work of most philosophers. (This is partly because their goals are different.)
We have a very rich and subtle collection of concepts for talking about mental states and processes and social interactions, including: abdicate, abhor, acquiesce, adultery, adore, admire, angry, astonish, attend and avid, to mention a few.
These have evolved over thousands of years, and they are learnt and tested by individuals in the course of putting them to practical use, in interacting with other people, understanding gossip, making sense of behaviour, and even in organising their own thoughts and actions.
All concepts are theory-laden, and the same is true of these concepts. In using them we are unwittingly making use of elaborate theories about language, mind and society. The concepts could not be used so successfully in intricate inter-personal processes if they were not based on substantially true theories. So by analysing the concepts, we may hope to learn a great deal about the human mind and about our own society. This point does not seem to be widely understood: this is why so many people (including many philosophy students) dismiss conceptual analysis as being 'merely concerned with meanings of words'.
Most of the theoretical presuppositions of our ordinary concepts are not concerned with laws or regularities, but with possibilities. For example, the use of a concept like careful is based on our knowledge that people can act in certain ways, not on any laws about how they always or usually act. The chapter on the mechanism of mind outlines some results of my own attempts to analyse familiar concepts concerned with actions and related mental processes. These analyses revealed a host of human possibilities, and the mechanism sketched in that chapter is intended to provide the beginnings of an explanation of those possibilities, showing how conceptual analysis can contribute to psychology and artificial intelligence.
Similarly, by analysing concepts related to space and physical motion, e.g. bigger, longer, inside, push, pull, carry, fetch, throw, impede, collide, and so on, we may expose some unarticulated theories about our physical environment which govern much of our thought and behaviour. This task is not so urgent because physics and geometry have already made a great deal of progress, often going beyond our common-sense theories. To some extent this has been a result of conceptual analysis: the most striking example being Einstein's analysis of concepts of space and time. However, further conceptual analysis is required for improving our understanding not of the physical world itself, but of how people of various ages and cultures think about the world (consciously and unconsciously). Intelligent machines may need to think of the world as ordinary people do, rather than as quantum physicists do. [Note added: 2001. The recent growth of interest in the study of ontologies in AI and software engineering illustrates this point.]
It has been easier to make substantial progress in the physical sciences partly because the physical world is much simpler than the world of mental and social processes. Moreover, our interactions with the physical world are not as rich as our interactions with people so there is more scope for commonsense to have evolved mistaken theories about matter.
In the rest of this chapter, I shall try to list some of the methods which are useful in analysing concepts. Most of this will be familiar to analytic philosophers, especially those who have studied the work of Austin and Wittgenstein. However,
I have found that the techniques are very hard to teach, and hope that by formulating these procedures, I may help both to clarify how the method works and to provide beginners with a basis for developing the skills involved.
I can only list some techniques for collecting 'reminders' about how our concepts work. The task of organising and explaining the phenomena by means of some kind of generative theory is very difficult. It is similar to the construction of scientific theories. I do not claim to be able to teach people how to be good scientists. (That will have to wait until we have computer programs which behave like good creative scientists, when we shall be in a better position to think about what it is to teach someone to be a scientist!) What follows is merely a sketch, with a few hints. The topic deserves a whole book, and should be susceptible of a better organised presentation than I can manage.
For example, if analysing the concept imagine, look also at image, imagination, suppose, consider, think, think about, think of, visualise, remember, invent, refer to, have in mind, . . . Similarly, in analysing the concept know, we would need to look at notice, discover, learn, believe, accept, understand, remember, forget, infer, evidence, reason, test, proof, and many more. Having found some related but different concepts, try to find examples which illustrate one concept but not the other, and vice versa.
Try to work out why each example fits one concept but not the other(s). For example, search for examples of knowing X without believing X, or examples of believing X without knowing X. (See Austin's use of examples to analyse the difference between 'by mistake' and 'by accident' in 'A Plea for Excuses'. My chapter on the mechanism of mind was based on an attempt to extend his work.)
This calls for a collection of examples of each kind of use to be thought about carefully, with a view to postulating some underlying mechanism. Another list might include a range of different kinds of things we can imagine (a visual scene, hearing a tune, doing something, a war starting, a mathematical theorem being false, etc.). (One of the things people find hard to learn is the technique of generating examples of things they already know about, including words and phrases. Wittgenstein was a master at this, though he was not very good at analysing the similarities and differences between the examples.)
Try fitting these categories to the lists of related concepts, to help bring out differences between them. For example, learning something is a process, knowing or believing something a state one is in (perhaps resulting from such a process). Believing something is a state involving a property of oneself, whereas knowing something involves an extra relation to the world (e.g. getting something right). Since knowing something is a state not an event (contrast learning, or discovering, or noticing), those philosophers and psychologists who refer to 'the act of knowing' are either revealing their inability to analyse their own concepts, or else using technical jargon which is bound to cause confusion because of superficial resemblances to concepts from our ordinary language. (I do not wish to deny that ordinary language is itself sometimes muddled.)
Some mental states, for example, believing that there is a tiger in the next room, can explain behaviour, such as running away, but do not involve an ability or any disposition to behave. However, the combination of the belief and another state, such as fear of tigers, may generate a disposition to lock doors, run away, or call for help, depending on circumstances. Some states, for example, knowing how to count, involve an ability which may or may not ever be manifested in behaviour, whereas others, for example, being an enthusiast (e.g. about golf, gardening, or Greek sculpture), involve a tendency or even a regularity in behaviour. 'He smokes' reports a habit which is manifested (much to the annoyance of many non-smokers), whereas 'he would like to smoke' reports an inclination which may be successfully suppressed forever, so that there need not be any behavioural manifestations.
Desiring and wanting are states, whereas deliberating is a process, and deciding an event which terminates such a process and initiates a state of being decided.
Very often noun phrases look as if they denote objects, whereas analysis shows that they do not. Having an image is being in a certain mental state. The state may explain various abilities or actions. Some people think of an image as an object which is somehow involved in the state of having an image much as a nose is involved in the state of having a nose. However, it may be that this is not how the concept works, and that to talk of the image is merely a short-hand and indirect way of talking about a very complex mental state: when we say that a house has a shape we are not saying that besides the house there is some other object, its shape; rather we are alluding to an aspect of the state of the house, namely how all its parts are related to one another.
Similarly if someone has a visual image: this is a matter of being in a state in which one is able to do a variety of things which one can normally do only when there is something one can see. It does not follow that the image is some kind of object like a picture though no doubt, as with all mental states and processes, there is some kind of symbolism used (probably unconsciously) to represent the thing imagined. (For more on this see Pylyshyn, 'What the mind's eye tells the mind's brain").
For example, motives have in common the fact that (when combined with beliefs) they can explain decisions, intentions, and behaviour. But this shifts the burden to the concept explain, or explanation', why are there so many different sorts of things we call explanations, and do they have anything significant in common? (An important and still open research question.) Careful is another example of a polymorphous concept: different sorts of things are involved in careful driving, careful teaching, careful selection of words in an essay, careful breaking of sad news, careful cleaning of a precious vase, etc. Here it is relatively easy to see what is in common to all these cases, namely reference to goals, possible undesirable occurrences, a collection of risks or dangers, paying attention to the risks, and doing whatever is required to minimise them.
Examples of concepts which seem to depend on more or less complex social institutions are: courage, dignity, disapproval, honour, shame, embarrassment, owning, owing, impertinence and gallantry. Wittgenstein (in his Philosophical Investigations) and his followers have argued that very many mental concepts, including 'following a rule', are essentially social. I think that they exaggerate because of their ignorance of possible computational models of mental processes.
For example, knowledge explains (or is able to explain) success; fatigue and confusion explain failure; desire explains attempts.
Does the explanation function as a cause, an enabling condition, a purpose, a justification, an excuse, a mechanism, a law, or what?
1. What sorts of things can bring them about?
2. What sorts of things can prevent them?
3. What sorts of things can facilitate their occurrence?
4. What can cause variations in the instances?
5. What sorts of effects can they have?
Sometimes it is possible to distinguish 'standard' from 'non-standard' causes, effects, etc. For example, there is something irrational about beliefs which are caused by desires ('wishful thinking') but not about actions caused by desires. (Why?)
Sometimes it is useful to distinguish events and processes a person can bring about from those which merely happen. You can decide to stop walking or trying to find something out, but you cannot decide to stop knowing or believing something. You can decide to try to get something, but you cannot decide to want it. Why not? (Answering this question would extend the theory of chapter 6.)
Linguists are increasingly trying to do this though it is not clear how far they appreciate the intimate connection between the study of our language and the study of our world.
For example, the verbs of motion mentioned earlier all seem to involve a subset of the following ideas:
Different combinations of these (and other) ideas can be used to generate whole families of related concepts, often including concepts for which we do not (yet?) have labels. For example, I do not think English contains a word which refers to a process in which an agent A carries an agent B to some location, and then A picks up some object and is carried, by B, back to the starting point.
Perhaps this is an important part of some social activity in some other culture. Some sort of obstacle race?
The 'primitive' ideas used as the basis for generating such a family of related concepts may themselves be susceptible of further analysis. Moreover, some concepts require mutually recursive definitions: for example, believe and desire cannot be defined independently of each other.
The sort of analysis suggested here for concepts of motion is now familiar to linguists and people working in artificial intelligence (for example Schank and Abelson, who also explore analogies between such physical processes and mental processes like communicating information. See Bibliography.)
Similarly, in analysing a concept like know, or knowledge, it will be necessary to distinguish a variety of elements and relations which can enter into scenarios involving knowledge. A person (or other knower) will be involved, as will things in the world about which something is known. There will be a state of mind of the person, in which some aspects of the things and their relationships will be represented, that is, a belief is involved, though not necessarily consciously. There will be something which gives rise to the belief, either at the time the person knows or at some earlier time, for example, a process of perceiving something, doing an experiment or test, or perhaps acquiring the information indirectly from other knowers, or inferring it from some other knowledge.
There will be a relation between the source of the belief and the belief which certifies or justifies the belief (e.g. the evidence is good evidence). There may be sentences, spoken, uttered, or merely thought, which state whatever it is that is known, and in that case the sentences can be decomposed (usually) into fragments with different relations both to items in the world and aspects of the knower's mind. There may or may not be uses to which the knowledge is put, including answering questions, interpreting one's experiences, making plans, acting in the world, understanding other people's sentences, formulating new questions, etc. (Again, study of a system of concepts from ordinary language can contribute to psychology, and to the attempt to design artificial minds.)
In two papers on ought, better and related concepts (1969 and 1970), I have tried to show how a variety of uses can be generated in a fairly systematic fashion. Similarly, much important work in the development of mathematics, for instance Euclid's, and later Hilbert's, work on the foundations of geometry can be seen as a form of conceptual analysis, though usually of a very reductive sort (that is many concepts and theorems are reduced to a very small number).
When is it adequate and when not? What patterns of behaviour are adequate tests? Are they decisive, or are they merely indicative? Why? Are there some situations in which no decisive test is possible, so that doubts cannot be removed? For example, a racialist who has excellent motives for concealing his attitude, and who is an excellent actor. (As we shall see later on, there is no reason to suppose that there should be behavioural tests for all internal computational states and processes, either in a computer or in a person or animal.)
The widespread belief within our culture that intellectual and emotional phenomena are quite disparate can be refuted by detailed conceptual analysis.
Suppose that in such a society it is commonplace for incurable cancer sufferers to agree to have their bodies copied by this machine, while under total anaesthetic, followed by cremation of the cancer-ridden body. The new one is allowed to take its place so people come home from hospital saying I'm glad to be back, and I feel much better now that I've got my new body'. In such a society is our concept 'murder' applicable to their treatment? Is the concept 'same person' applicable to the person who goes into the hospital and the person who comes out? (For more on this see my 'New bodies for sick persons'.)
Another example: people disagree over whether it is essential to the concept 'emotion' that emotions involve felt bodily changes. One way of convincing yourself that such physiological processes are not essential is to imagine a society of Martians who are very much like us with very similar sorts of social institutions and similar ways of seeing, thinking, and acting, but who do not have the bodily reactions which we (or some of us) feel in certain emotional states. So they have hopes, disappointments, pleasant and unpleasant surprises, they feel pity, loneliness, dismay when their plans go wrong, they are anxious when there is a high probability of things going wrong, they are proud of their achievements, envious of others who are more successful, greedy for wealth, and so on. By describing the behaviour and social interactions of such beings in great detail, and imagining what it would be like to communicate with them, you should be able to convince yourself that you would find it perfectly natural to use our emotion concepts in talking about their mental states.
You would say 'He's terribly embarrassed about the attention he's getting', even though he feels no hot flush in the cheeks or any other physiological change characteristic of embarrassment in humans.
Of course, this sort of investigation does not produce knock-down arguments, because people can differ in how their concepts work. For example, mathematicians use a concept of ellipse which includes circles, whereas for non-mathematicians a necessary condition for something being an ellipse is that it has major and minor axes of differing lengths. Similarly, there may be some people for whom the accompanying physiological changes are necessary conditions for the applicability of concepts like envy, embarrassment, loneliness, etc. However, what one can demonstrate to such people is that by insisting on these necessary conditions they are making it impossible for themselves to describe situations which might one day arise, without inventing a whole lot of new terminology which may prove very hard to teach. Whereas I would claim that my use of the non-physiological concept of emotion in no way interferes with my communication with other people, and allows me the power to read science fiction without any feeling of linguistic distortion.
Of course, sometimes a little thought makes this elaborate kind of test unnecessary. Nevertheless, the methods of A.I. provide a useful extension to previous techniques of conceptual analysis, by exposing unnoticed gaps in a theory and by permitting thorough and rapid testing of very complex analyses.
This account of conceptual analysis is by no means complete. For more detailed examples, refer to the writings of philosophers mentioned and also A.R. White's Attention, and his contribution to Owl of Minerva, (ed. Bontempo and Odell), and Margaret Boden's Purposive Explanation in Psychology. Philosophers usually do not pay enough attention to problems of describing mental processes. Neither do they normally attempt the kind of system-building involved in designing a 'grammar' for a collection of concepts in the manner hinted at above. For instance, is there some sort of grammar for concepts related to attention? In other words, is there a relatively small subset of concepts in terms of which all the others can be defined? I believe the answer is 'Yes' but to establish this will require designing a fairly detailed model of a person, capable of generating a large number of processes involving perception, deliberation, reasoning, planning, problem-solving, and execution of plans and intentions. Some small steps in this direction are taken in Chapter 6, which proposes some minimal architectural requirements for a human-like system.
Despite my disparaging remarks about philosophers, there have been some profoundly important systematic analyses, mostly produced by philosophers of logic and mathematics, such as Frege, Russell, Tarski and Prior. For example, Frege's analyses of concepts like all, some, nobody, and related quantifiers, led to a revolution in logic and has profoundly influenced the development of computer programming languages used in artificial intelligence (via the work of Alonzo Church). Austin's How to do Things with Words is another example of a philosopher's attempt at detailed and systematic analysis, which has made a great impact on linguistics and more recently on AI.
If only Wittgenstein, in his later writings and teaching, had not made such a virtue of his inability to construct systematic theories integrating the results of his analyses, a whole generation of philosophers might have been far more disciplined and productive.
Of course, there are dangers in insisting on everything being formalised and systematic. Much shallow theorising is a result of trying to fit very complex and messy structures into a neat and simple formal system. A well known example of the distorting effect of formalisation is the claim that the logical connectives of propositional calculus adequately represent the words 'and', 'not', 'or', 'if, etc. of ordinary language. However, even if this claim is false, it remains true that the formalisation provided a basis for deeper exploration than was previously possible. For example, by describing exactly how the use of the ordinary words deviates from the truth-functional symbols, we obtain useful descriptions of how they work. (See Gazdar and Pullum 1977.) The same can be said of some other systematic but inaccurate analyses.
The two extremes to be avoided are demanding formalisation of everything at all costs, and rejecting formalisation because some of our concepts are too complex and unsystematic in their behaviour for us to be able to represent them in elegant formal systems. One of the great advantages of using programming languages for formulating analyses of concepts (as Winograd did see his 1973), is that programming languages are well suited to include many tests for special cases and exceptions to general rules. It is much harder to use formal grammars, or axiomatic systems, for this purpose.
Conceptual analysis can play a role in science and mathematics too. I have already mentioned Einstein's work involving analyses of concepts like simultaneous, and other spatial and temporal relations. Another example is the struggle by mathematicians of previous centuries to clarify the concepts infinite and infinitesimal, leading to the discovery of the concept of a limit, and to formal set theory.
Every science will have at its frontiers concepts which are to some extent in need of analysis and possibly improvement. Not all the problems of science are to be solved simply by collecting new facts, or by using existing terminology to build new theories. In the mature sciences, the concepts most in need of analysis will usually be highly technical, remote from the concepts of ordinary language.
However, in the social sciences and psychology, and increasingly in artificial intelligence, concepts from ordinary language play a central role in the construction of new theories and in the description of phenomena to be explained. Thus it is important for practitioners of these disciplines to be sensitive to the need for analysis, and to be skilful at doing it.
The dangers of failing to analyse concepts properly can be illustrated by a few rather extreme examples. Someone who had not seen how the concept bachelor worked might think it interesting to do a survey to find our what proportion of the bachelors in some social group were unmarried. He would probably get no support from research councils. However, less obvious mistakes of the same sort could pass unnoticed, like attempts to test the hypothesis that other things being equal people tend to believe things which are asserted by those they respect, or the hypothesis that other things being equal people tend to try to achieve goals they think they can achieve, or the hypothesis that being embarrassed involves believing that other people are paying attention to you. Of course, such research goals would usually be disguised in obfuscating jargon, but that does not reduce the need for conceptual analysis. I once read a research proposal which looked very impressive until the English equivalent to the jargon emerged. The aim was to find out whether people tend, on the whole, to co-operate more successfully if they get on well together. (For some similar criticisms of Social Science, see Andreski, Social Science as Sorcery.)
An example of an important piece of biological theorising whose concepts cry out for detailed analysis can be found in Dawkins' The Selfish Gene.
Besides the role of conceptual analysis in preventing muddled thinking and silly research, there is another important role in relation to science, namely making explicit some of what we already know, clearly a useful preliminary to attempts to add to what we know. I believe this is especially useful in fields like developmental psychology and anthropology, concerned with the study of ways of thinking and learning. Previously I listed some concepts concerned with spatial movement and indicated how one might begin to analyse some of the more complex ideas in terms of combinations of relatively primitive ones. Very young children somehow acquire both the relatively 'primitive' concepts and also a variety of complex combinations of these. It is not thought to be beyond them to grasp the difference between 'fetch' and 'send for' expressions which occur in familiar nursery rhymes. By studying these concepts we can define some of the tasks of psychology. An adequate theory of learning must account for a child's ability to master these ideas. Even very young children are capable of grasping quite abstract rules, including rules which they cannot formulate in words. For example, a three-year-old reacted to his older brother's use of 'nope' for 'no', by starting to say not only 'nope' but also 'yesp', 'okayp' and 'thankyoup'. Try formulating the rule he had invented! (Do developmental psychologists, or brain scientists, have any convincing explanation of the ability to learn these things?)
By improving our understanding of what it is that our children have to learn we may perhaps come to understand better not only how they learn, but also what sorts of things can go wrong with the learning process, and perhaps even what can be done about it. How many teachers in schools, colleges and universities have sufficient skill in conceptual analysis to be able to discern subtle differences between the concepts they are trying to teach and the concepts so far grasped by their pupils?
Other social sciences can also benefit from conceptual analysis. By doing this sort of analysis for concepts used in several different cultures, anthropologists and sociologists could enhance their studies of what is common and what varies among different modes of thinking and reasoning.
I have already alluded several times to the role of conceptual analysis in the work reported in this book. Several chapters are based in part on attempts at analysing familiar concepts. But most of the work is still sketchy and makes use of concepts which themselves require further study.
The chapter on the aims of science, for example, makes liberal use of a very complex concept which still requires further analysis, namely the concept of what is possible. Several other concepts used in that chapter are equally in need of further investigation.
The chapter on analogical representations attempts to analyse a familiar distinction between different sorts of symbolisms, or representations, showing that the verbal/pictorial distinction is usually misdescribed and that there are actually several different distinctions where at first there seems to be only one.
The chapter on learning about numbers begins to analyse some of our simplest number concepts, drawing attention to complexities in what a child has to learn which are not normally noticed.
The chapter on computer vision, and the ensuing discussion includes some small steps towards clarifying a collection of familiar concepts like conscious, interest, experience.
Nearly all of this work is incomplete, and will remain incomplete for many years. But, as I have suggested in this chapter and will try to substantiate later, the methodology of artificial intelligence will be a major spur to progress.
[[Note Added November 2001
Since this chapter was first published, the problem of `knowledge elicitation' in designing expert systems has received much attention. It is not widely appreciated that the techniques of conceptual analysis as described here (and practised by many philosophers) are often crucial to such knowledge elicitation. There is also considerable overlap between these ideas and the Naive Physics project proposed by Pat Hayes: See P.J. Hayes, The second naive physics manifesto, in Formal Theories of the Commonsense World Eds. J.R. Hobbs & R.C. Moore, Norwood, NJ, Ablex, 1985, pp. 1-36
Note added February 2007
Additional discussion of the nature of conceptual analysis, its relationship with what Gilbert Ryle called 'logical geography', and a possibly deeper notion of 'logical topography' can be found here http://www.cs.bham.ac.uk/research/projects/cogaff/misc/logical-geography.html ]]
Original pages 103-112
Experience has shown that many readers will have been made very uncomfortable, if not positively antagonistic, by my remarks about the role of computing and computer programs in philosophy and the scientific study of human possibilities. There are several reasons for this, including (a) ignorance of the nature of computers and computer programs, (b) misunderstandings about the way computers are used in this sort of enterprise, (c) invalid inferences from the premises that computer simulations of human minds are possible, and (d) confused objections to specific theories expressed as computer simulations.
Since the symbols stored in the computer may include instructions for it to obey, and since it can be instructed to change some or all of the symbols within it, it follows that as a computer executes instructions within itself, the instructions may change and thus the processes occurring may evolve in complex ways. In the end, the original program may have completely disappeared. Exactly how this happens may depend not only on the original program but also on the history of interactions with the environment. So no programmer, or anybody else, is responsible for the eventual state of such a mechanism or for its behaviour.
In any modern digital computer the basic symbolic processes which occur will all be very simple, such as putting a zero or a 1 in some location, or comparing two symbol-strings, or copying the contents of one location into another, or performing logical or arithmetical operations. But it is not helpful to think of a computer as 'simply' performing such simple operations, any more than it is helpful to think of a Shakespeare play as 'simply' composed of letters, punctuation marks, and spaces.
Computers can perform millions of their basic operations each second. Many different kinds of books can be written using the same small set of printed characters, and similarly an enormous variety of processes can be represented by complex combinations of the simple processes in a computer.
In particular, the processes need not be fully controlled by all the symbols in the store at any time. For among the instructions executed may be some to the effect that new symbolic information should be accepted from various devices attached to the computer, such as a television camera or a microphone, or a teletype at which a person sits communicating with the computer. Some of the new symbols coming into the computer in this way may lead to changes in the stored instructions, just as much as execution of stored instructions can. (This, incidentally, is why all the philosophical debates about Godel's incompleteness theorem and related theorems proving that there are limits to what any particular computing system can do, are irrelevant to the problem of what sorts of intelligent mechanisms can be designed: for all these theorems are relevant only to 'closed' systems, i.e. systems without means of communicating with teachers, etc.)
Computing science is still in a very early phase. Only a tiny fragment of the possible range of computer programs has so far been investigated, and many of these are still only partly understood. Complex programs sometimes work for reasons which their designers only half understand, and often they fail in ways which their designers cannot understand. It follows that nobody is in a position to make pronouncements about the limits of what can be done by computer programs, especially programs which interact with some complex environment, as people do.
Attempting such pronouncements is about as silly as attempting to use an analysis of the printing process to delimit the kinds of theories that will be expounded in text-books of physics in a hundred years time. Nevertheless, people with theological or other motives for believing that computers cannot match human beings will continue to be overconfident about such matters (e.g. H. Dreyfus, What Computers Can't Do).
The last general remark I wish to make about computers is that the definition given above does not assume anything about what the mechanism is made of. It could be transistors, it could be more old-fashioned electronic components, it could be made of physical components not yet designed, it could somehow be made out of some non-physical spiritual stuff, if there is any such thing. The medium or material used is immaterial! All that matters is that enough structures are available to represent the required range of symbols, and that appropriate structural changes can occur in the computer. As Margaret Boden once remarked, angels jumping on and off pin-heads would do.
This is not the place to enlarge further on what computers are. Interested readers should consult Electronic Computers, by Hollingdale and Toothill, or Weizenbaum's Computer Power and Human Reason. See also chapter 8 of this book.
However, computers are not natural objects to be studied. They are artefacts to be improved and used. If people had been content to study computers instead of programming them, very little would have been learnt, for a computer does nothing unless it is programmed. But what it does depends on how it is programmed. So approaching a computer with a view to finding out what it can do is as silly as it would be for a physicist to study pencil and paper with a view to finding out what they can do. One approaches a computer in order to try to make it do something. The physicist writes things down, calculates, tries out formulae and diagrams, etc. He constructs, explores and modifies a theory. That is how to use a computer in order to study intelligence: by designing a program which will make it behave intelligently one constructs a theory, expressed in that program, about the possibility of intelligence. The failure of the theory is your own failure, not the computer's.
The short answer is that just because an electronic computer is a physical system, it does not follow that everything it successfully simulates is a physical system: there could be computer programs simulating the structures and functions of mechanisms composed of some spiritual substance!
So even if the human mind is not merely a function of the physical brain, but has some non-material or non-physical basis (whatever that may mean), then the behaviour or function of that stuff is what computer programs can simulate. In fact a program does not specify what kind of computer it runs on. The computer may use transistors, valves or spiritual mechanisms, so long as a rich enough variety of structural changes is available, as I have already pointed out.
A longer, and more important, answer is that the ontological status of mind has little relevance to the problems of this book. Both Dualism, which postulates some kind of spiritual entity distinct from physical bodies, and Materialism, according to which minds are just aspects of complex physical systems, lack explanatory power. That is, both of them fail on the criteria proposed in chapter 2 for adequate explanations in philosophy or science. They fail either to describe or to explain any of the fine structure of such aspects of mind as perception, memory, reasoning, understanding, deciding, desiring, enjoying, creativity, etc., or the relations between them.
In order to explain how all these things are possible, we need a theory describing or representing the structures and functions of a mechanism which can be shown to have the right sorts of abilities, that is a mechanism able to generate within itself structures and processes with the kinds of mutual relationships which we know hold between mental phenomena. For instance, we know that a certain experience, such as seeing a tool being used, can produce a change in what a person knows, and thereby can change what he is able to do and the decisions he can take in order to deal intelligently with problems. To explain how this sort of thing is possible, e.g. to explain how one can learn to operate a tool by watching its use, it will not do simply to say what kind of stuff the underlying human mental mechanism is made of.
Being told that a computer is made of physical components, for instance, tells you nothing about the kind of internal organisation that made it possible for the PDP-IO computer used by Winograd (1973) to hold conversations in ordinary English. Similarly, being told that the mind is spiritual or non-physical explains nothing.
For similar reasons, neurophysiology cannot help in the early stages of the search for explanations of the possibility of mental phenomena and we shall remain in the early stages for some time. Studies of neurophysiology, or the electronic basis of a computer, may explain such things as how fast the system performs, and why it sometimes goes more slowly, or why it sometimes breaks down altogether; but cannot at present explain how it is possible for the system to perform a particular type of task at all. Such an explanation requires study of the brain's programs, not its low level (physical) architecture and neurophysiology currently lacks conceptual and other tools needed for studying programs. (Study of a computer's architecture tells one practically nothing about the programs currently running on it. The programs may change drastically while the physical architecture remains the same, and different computer architectures may support the same programs. Computers are not like clocks.)
[Note added 2001: I would now put this by saying that the virtual
machine architecture is more important than the physical machine
(For more on this see recent papers in
The study of physical architectures would be relevant if it could be used to demonstrate that certain sorts of virtual machines could and others could not run on brains. But right now we still do not know enough about ways of mapping virtual machines onto physical machines for useful constraints to be derived.]
The only kinds of explanatory mechanisms that have some hope of being relevant to explaining mental possibilities like perception, learning and decision making, are mechanisms for manipulating complex symbols, for example, computer programs.
People whose sole experience of computing is with programs for doing highly repetitive algorithmic numerical calculations, or programs for simulating feedback systems, may find it hard to understand how programs can be relevant to our problems. An essential antidote to this prejudice is a study of the literature of artificial intelligence to learn how, besides doing numerical calculations in an order determined by the programmer, computer programs can also construct, analyse, interpret, manipulate, and use complex symbolic structures, like lists, pictures, sentences or even sub-programs, in a flexible way determined by analysis of developments during the computation rather than following an order worked out in advance by the programmer.
All this can be summarised by saying that the known important mechanisms are not computers (those ugly boxes with mysterious noises and flashing lights), but programs or virtual machines. Computers are an old type of mechanism: they are physical machines. Programs are a new type. A simulation program could drive not only a physical computer, but, if ever one were made, a computer composed entirely of spiritual stuff (The program, not the medium, is the message.)
The objector may add that it is clear that existing computers do not do things the way we do, since, at the physical level they use transistors and bits of wire, etc., whereas our brains do not, and even at the level of programs they have to employ interpreters or compilers which translate the high level intelligent and flexible symbol-manipulating programs into sequences of very simple and very mechanical instructions which have to be followed blindly, whereas there is no evidence that humans do this.
This objection (which seems to pervade the book by H.L. Dreyfus, What Computers Can't Do), is based on the concept 'doing things in the same way', which requires some analysis.
The notion of doing something in the same way is systematically ambiguous. Two persons may calculate the answer to an arithmetical question in the same way insofar as they both use logarithms but in different ways insofar as they use logarithms with different bases. It is all a matter of how much and what sort of detail of a process is described in answer to the question In what way did he do it?' That some very detailed description would be different in the case of a computer does not imply that there is no important level at which it does something the same way as we do. We don't say a Chinaman plays chess in a different way from an Englishman, simply because he learns and applies the rules using a different language, so that his thinking goes through different symbolic processes. He may nevertheless use the same strategies.
The same problem arises about whether two computer programs producing equivalent results do so in the same way. Two programs using essentially the same algorithm may look very different, because they are written in different languages or in different programming styles. Any program is a mixture of 'main ideas' and implementation details. The same may be true of human abilities.
The problem of knowing the way in which a computer does something is no different in principle from the problem of knowing the way in which a person does. it. In both cases there are questions that can be asked, and tests that can be given, which provide useful clues. (Compare Wertheimer's tests for whether children understand and apply a technique for finding areas of a parallelogram in the same way as he does, in Productive Thinking, chapter I. He sees whether they can solve a very varied range of problems.)
Insofar as anything clear and precise can be said about 'the way' in which a human being does something (e.g. plays chess, interprets a poem, or solves a problem) the appropriate procedure can in principle be built into a suitable simulation, so that we ensure that the machine does it in the same way. For instance, programs can be written to do multiplications using ordinary decimal arithmetic, or binary arithmetic, or alternatively using natural language.
Finally it should be noted that it is very unlikely that there is only one way in which something or other is done by all human beings, whether it be perceiving faces, remembering names, playing chess, solving problems, or understanding a particular bit of English: we all have our own quirks and foibles, so it-is unreasonable to deny this right to a complex computer simulation.
I do not wish to argue that every aspect of the human mind can be simulated on digital electronic computers, any more than an astronomer's explanation of an eclipse explains or predicts every aspect of the motion of the earth, moon and sun. For instance, certain types of human experience seem to be possible only for beings with human bodies, or bodies with very similar structures. Thus, feeling thirst, nausea, muscular exhaustion, sexual desire, the urge to dance while listening to music, or the complex combination or bodily sensations when one is about to lose one's balance whilst walking on ice, may be forever inaccessible to computer programs within immobile rectangular boxes, or even to humanoid mobile robots who are made mainly of plastic and metal. (For more on these general issues, see the contributions by H.L. Dreyfus, N.S. Sutherland, and myself to Philosophy of Psychology, ed. S.C. Brown.)
These abstract debates about what can and cannot be done with computer programs are not too important. Usually there is more prejudice and rhetoric than analysis or argument on both sides. What is important is to get on with the job of specifying what sorts of things are possible for human minds, and trying to construct, test, and improve explanations of those possibilities. Anyone who objects to a particular explanation expressed in the form of a program, should try to construct another better explanation of the same range of possibilities, that is, better according to the criteria by which explanations are assessed (see chapter 2). The preferred explanation should account for at least the same range of possibilities with at least as much fine structure.
The rest of this book will be concerned mainly with the description of some important possibilities known to common sense, together with some rather sketchy accounts of what good explanations might look like. I shall frequently point out ways in which the attempt to design computer simulations can subserve the endeavour to understand the human mind.
[[Note added 2001:
After this book was published there was a revival of interest among many AI researchers in "connectionist" architectures. Some went so far as to claim that previous approaches to AI had failed, and that connectionism was the only hope for AI. Since then there have been other swings of fashion. It should be clear to people whose primary objective is to understand the problems rather than to win media debates or do well in competitions for funding that there is much that we do not understand about what sorts of architectures are possible and what their scope and limitations are. It seems very likely that very different sorts of mechanisms need to be combined in order to achieve the full range of human capabilities, including controlling digestion, maintaining balance while walking, recognising faces, gossiping at the garden gate, composing poems and symphonies, solving differential equations, and developing computer programs such as operating systems and compilers. I don't know of an any example of an AI system, whether implemented using neural nets, logical mechanisms, dynamical systems, evolutionary mechanisms, or anything else, that is capable of most of the things humans can do including those items listed above. This does not mean it is impossible. It only means that AI researchers need some humility when they propose mechanisms. ]]
[[Note added 20 Jan 2002:
A number of arguments against computational theories of mind have been advanced since this book was written. Many of them use arguments that were already rebutted in this chapter, or put forward views that were expressed in this chapter. For example, the argument that brains work in different ways from computers therefore computational theories of mind must be incorrect is rebutted above by pointing out that systems may be different at one level of description and the same at a more abstract level of description. Abstraction is often very useful, as demonstrated by the history of science in general and physics in particular. The argument that intelligence or mentality requires embodiment is rebutted by pointing out that some aspects of mind may depend on details of the body whilst others do not. Of course, that leaves unanswered the important research question: which forms of embodiment can support which forms of mentality?
Many critics of AI and some defenders of AI have based their argument on the assumption that AI in some sense presupposes that all computation is Turing Machine computation. I have tried to argue in recent years that the notion of "computation" is not sufficiently well defined to support such criticisms. In particular I have argued that the notion of "computation" employed by most users of computers, designers of computers, programmers, and AI researchers, has nothing to do with Turing machines but is an extension of two notions which go back to long before Turing, namely
Both ideas were well advanced before the beginning of the twentieth century, for instance in automated looms, mechanical calculators and Hollerith machines for sorting and collating information. In the middle of that century advances in science and technology made it possible to combine those ideas in new ways, providing far greater speed, power, flexibility (e.g. self programming), and cheapness. These points are elaborated in a paper on the irrelevance of Turing machines to be published during 2002, and other papers available here: http://www.cs.bham.ac.uk/research/cogaff/
- The notion of a machine that can control something, possibly itself
- The notion of a machine that operates on abstract entities, such as numbers, or census information.
Despite all the progress of the last half century, it is clear that we still have much to learn about the nature of information and varieties of machines, including virtual machines, that can process information -- themes developed in these talks: http://www.cs.bham.ac.uk/research/projects/cogaff/talks/ and these discussion papers http://www.cs.bham.ac.uk/research/projects/cogaff/misc/ ]]
Original pages 112-143
In particular, I want to undermine a common misconception about computers, namely that however complex the programs that run in them they are always essentially unintelligent, uncreative mechanisms, blindly following simple rules one at a time. Such a description may well be true of the underlying electronic components, just as it may well be true to say that a human brain is always essentially an unintelligent uncreative bundle of nerve-cells (or an assemblage of atoms) blindly reacting to one another in accordance with chemical and physical laws of nature. But just as the latter description may omit some important features of what a brain can do, so also the former description omits important 'high-level' features of complex computer programs. What is true of a computer need not be true of a program, just as what is true of a brain need not be true of a mind. In both cases the whole is far more than the sum of its parts.
I am not trying to explain phenomena which are unusual, hard to observe, and known only to experimental psychologists. The facts about people that I take for granted and attempt to account for are facts which we all know, though we may not all reflect on them, they are part of our shared common-sense.
Many notable examples of creativity are discussed in A. Koestler's The Act of Creation. However, we can also observe frequent examples of what seems to be essentially the same kind of flexibility and creativity in the daily life of ordinary persons, in our efforts to cope with spilt milk, ungrammatical sentences, unfamiliar typewriters, blind alleys, broken suspenders, lost keys, illegible handwriting, mixed metaphors, puzzle pictures and veiled insults. The child who uses his old counting ability as a basis for answering new questions (like 'what number comes before five?') is as creative as any artist or scientist. How can we explain this flexibility and creativity?
What is required is a design for a computing system which is able to cope with types of possibility not covered by the programmer's analysis. More precisely, it is necessary to combine into a single system, competence in a variety of domains, in such a way that expertise in two or more domains can be combined creatively and flexibly in dealing with novel situations or problems. Instead of the programmer doing the analysis of all types of possibilities in advance, the program should be able, in at least some cases, to do the analysis when it is appropriate to do so, and to record the results for future use.
These abilities require the system to contain mechanisms which facilitate communication of information between different sub-processes in an unplanned way: the programmer need not have anticipated each possibility inherent in the system. I shall now give a sketchy description of how this might be achieved. Several steps in the construction of such a system have already been taken by people designing artificial intelligence programs (e.g. Sussman, 1975).
However, it will be seen to be useful to blur the distinction between the mind of the mechanism and the environment. (This blurring in one form or another has a long philosophical history. See, for example, Popper's 'Epistemology without a knowing subject', reprinted in his Objective Knowledge. As he points out, Plato, Hegel and Frege had similar ideas.)
We shall discuss interactions between the following structures:
Many more or less temporary internal and external processes (actions) will be generated by these structures. There will also be the following more permanent processes ensuring that the actions which occur are relevant to the current motives and that intelligent use is made of previous knowledge and new information:
The system must have several kinds of processes running simultaneously, so that implementing it on a computer will require multi-processing time-sharing facilities already available on many computers, and used in the POPEYE vision program described later in chapter 9. This global parallelism is an important requirement for our mechanisms, though concurrent processes can be implemented in a very fast serial machine.
The main parts of the mechanism will be described separately in terms of their functions. However, computing models, unlike previous kinds of mechanisms, should not be thought of as composed of several interlocking parts which could exist separately, like parts of an engine or human body. Normal concepts of part and whole do not apply to computing structures and programs.
For instance, two data-structures stored in the memory of a computer, containing pointers to their elements, may contain pointers to each other, so that each is an element of the other. This can be illustrated by so-called 'list-structures'.
Thus, a list A may contain, among other things, the list B, while list B contains the list A. A is then an element of B and B an element of A (which is not possible for physical collections). A list may even be an element, or part, of itself. Examples of circular structures will be found below in chapter 8. (For further details consult a manual on some list-processing programming language, e.g. Burstall, et al. 1973, or Foster, 1967, or a manual on Lisp, Prolog, Scheme, or Pop-11).
Similarly, computer programs may be given names, in such a way that at a certain point in the set of instructions defining one program. A, there is an instruction of the form If condition X is satisfied then run program B', while program B contains a similar call of program A. Program A may even contain an instruction to run itself. (These are examples of 'recursion'.) Such circular programs are able to work provided certain conditions are satisfied which can be roughly summed up by saying that during execution the series of nested, or embedded, calls to programs (sub-routines) must eventually produce a case where a particular program can run without meeting the conditions which make it call another: it can then do its job and feed the result back to the program which called it, which can then get on with its job, and so on. This is commonplace in programming languages which permit recursion, such as ALGOL, ALGOL68, LISP, or POP-2.)
In such cases, we can say that program A is part of program B, but B is also part of A. More complex chains or networks of such circular relationships between programs are possible. Similarly, human abilities, as we shall see, combine to form complex systems in which normal hierarchic part-whole relationships are violated: each of two abilities may be a part of the other. For instance, the ability to read someone's hand-writing may be a part of the ability to understand his written sentences, and vice versa.
Because these ideas have been made precise and implemented in the design of computing systems, we can now, without being guilty of woolly and unpackable metaphors, say things like: the environment is part of the mechanism (or its mind), and the mechanism is simultaneously part of (i.e. 'in') the environment!
We turn now to a sketch of structures, programs and processes in a mechanism to simulate purposiveness, flexibility and creativity. I cannot give more than a bird's eye view of the system at present. My description is deficient, in that it does not provide a basis for a team of experienced programmers to construct such a system. At best, it provides a framework for further research.
The different structures about to be mentioned are listed separately in terms of their different functions in the system. But they need not exist separately. As already remarked, one structure may be part of another which is part of it. Some of the structures are in the mind (or computer), some not.
But for human beings it may also include, at the same time, a more abstract culturally determined domain, such as a kinship system, or a system of socio-economic relationships, within which the individual has a location. Some of the 'innards' of the mechanism or person may also be thought of as part of the environment, since the system can examine and act on them! (See chapter 10 for more on this.) Similarly, parts of the environment, like internal structures, may be used as an information store (blazing a trail, writing a diary, 'reading' the weather-signs, putting up signposts), so that the environment is part of the store of knowledge, that is, part of the mind.
Several different kinds of language or symbolism may be used, for instance sentences, networks representing sets of relationships, maps, diagrams, and templates. Some of the information may be procedural, for example, in the form of routes and recipes. The information will necessarily be incomplete, and may contain errors and inaccuracies. There may even be undetected inconsistencies, and mistakes about the system's own states and processes. We do not necessarily know our own minds.
What gets into the store will depend not only on what stimuli reach the sense organs, but also on what languages and symbolisms are available for expressing information, and on what kinds of perceptual analysis and recognition procedures (i.e. the monitors mentioned below) are available and active. (What is already in the store will also make a difference. Where things are stored will depend on indexing procedures used.)
In order that its contents be readily accessible, this store of beliefs will have to have an index or catalogue associated with it, possibly including general specifications of the kinds of information so far available or unavailable. For instance, it should be possible to tell that certain types of information are not present without exhaustive searches. (How long does it take you to decide whether you know what Hitler ate at his last meal?) The index may be implicit in the organisation of the store itself, like the bibliographies in books in a library, and unlike a library catalogue which is kept separate from the books. If books contained bibliographies which referred directly to locations in the library (e.g. using some internationally agreed system for shelf-numbers) the analogy would be even stronger.
If they are not explicit, but are implicit in decision-making procedures, then it will be much harder for the system to become aware of the reasons for what it does, and to revise its decision-making strategies. (Compare the discussion of consciousness in chapter l0, below. A more detailed analysis would distinguish first-order motivators, such as goals or desires, from second-order motive-generators or motive comparators, e.g. attitudes, policies and preferences.)
Some of the contents of the store will have been generated on the basis of others, for instance as means, plans, or strategies for achieving some end. (This store must not be thought of as a 'goal-stack' with only the last added goal accessible at any one time as in some over-simple computer models.)
The representational devices may be varied: for instance some motivational information might be stored in an apparently factual form within the previously mentioned store of beliefs, for example, in sentences like 'Jumping off objects is dangerous', or 'Nasty people live in town T'. This can work only so long as adequate procedures are available in at least some contexts for finding and using this sort of information when it is relevant to deciding what to do.
The processes produced by the mechanism, that is its actions, whether internal or external, will be generated, modified, controlled, interrupted, or terminated, by reference to the contents of the motivational store, in ways to be explained briefly below. Such purposive actions may include planning processes, the construction of new motives, problem-solving processes, external movements, manipulations of objects in the environment, processes of modifying plans or actions generated by other processes, and also perceptual or monitoring processes.
One of the constraints on the design of a human-like intelligent system is the need to act with speed in many situations. This has some profound design implications. In order that rapid decisions may be taken in a complex world there will have to be a very large set of 'rules of thumb', including rules for deciding which rule to use, and rules for resolving conflicts. This is almost certainly incompatible with assumptions made by economists and some moral philosophers about how (rational) people take decisions. For instance, there need not be any overall tendency for the rules to optimize some abstraction called 'utility'.
At any time, some of the purposes or other motivational factors may not yet have generated any process of planning or action: for instance, a purpose may have been very recently generated as a new sub-purpose of some other purpose, or it may have a low priority, or there may not yet have been any opportunity to do anything about it, or it may be a conditional purpose (do X if Y occurs) whose condition has not been realised, or some other purpose or principle (for example, a moral principle) may override it. Thus many existing motivational factors may generate no decisions.
Similarly, plans and decisions that have been formulated on the basis of motives may still not have generated any action for analogous reasons.
Among the resources should be linguistic or symbolic abilities. Those are needed for formulating problems, purposes, procedures and factual information. Chapter 2 indicates some of the reasons why notational resources can be very important. Chapter 7, below, explains why different kinds of symbolisms may be required for different sorts of tasks or sub-tasks. Other resources would include procedures for constructing plans or routes (for example, with the aid of maps), procedures for getting information and solving problems, such as problems about why an action went wrong, and procedures for constructing, testing and modifying other procedures. (See Sussman 1975 for a simple working example.)
That is to say, the resources store will include collections of 'intelligent' programs of the sorts currently being produced by workers in artificial intelligence. The concept of a resources store, like the concept of an environment, expands to swallow almost everything! This is why a catalogue is necessary.
It must be possible for new information about typical causes and effects, or requirements, of old resources to be added to the catalogue. The system may have to use pointers in the catalogue in two directions, namely, starting with some purpose or need, it should be able to use the catalogue to get at available resources which might meet that need. So pointers are needed from purpose-specifications to resources. However pointers are also needed the other way, since in selecting one resource it may often be important to know what sorts of uses, and effects, it can have besides the one which led to its selection: if some other typical use of the resource matches another current motive or need, then the resource may be 'a stone that kills two birds'. Alternatively, if a resource selected as a possible means to one end has a typical effect which will frustrate some other current purpose (or principle, or preference, etc.), then an alternative resource should be sought, or the other purpose abandoned. Those are some of the design implications that follow from the need to cope with multiple motives.
Sometimes information about typical uses and side effects of a procedure (or other resource) can be got by inspecting its structure. But often such things are learnt only by experience of using the resource and in the latter case we need explicit additional entries.
For a very large store of resources, as in the human mind, the catalogue will have to be highly structured, for instance in the form of a tree, with lower levels giving more details than higher levels. The organisation of the catalogue may be partly implicit in the searching and matching procedures. As indexing can never be perfect, the system will have typically human failings, no matter how fast and large a computer is available. (This is contrary to some optimistic pronouncements about the way bigger and faster computing systems will enable super intelligences to be made.)
In order to be able intelligently to modify ongoing processes, terminate them, interrupt or suspend them, change their order, and so on, in the light of new information, the system will have to have information about which processes and sub-processes are generated by any given motive, and which motives lay behind the initiation of any one process.
The function of a process-purpose index is to store this information about the reasons for various actions. It may need to be modified whenever a new process is initiated or an old one terminated, or when any of the reasons for doing something change, for example, if one of the birds a stone was intended to kill turns out to be already dead. The system will thus have access to the reasons why it is doing things. Faults in the procedures for keeping the process-purpose index up to date may account for some pathological states.
So if a process generated by purpose PI accidentally achieves a purpose P2, and this is detected by the monitors, then the index shows which other processes were generated by P2, and can therefore be terminated, unless the index still contains a pointer from one of them to some other as yet unfulfilled purpose P3. Other uses of the process-purpose index will be mentioned below.
Perhaps one of the most important reasons why it is necessary to be able to be in the midst of several different processes at once, is that this provides opportunities to learn from accidental interactions between processes. The process-purpose index, which relates current activities to the reasons for doing them makes it easier to achieve such learning. For example, one might learn that a certain purpose can be achieved in a new way, because of an unexpected interaction between the old strategy for achieving it, and some other activity.
The process-purpose index should not be confused with the relatively more static, less changeable, resources catalogue. Their functions are different. For instance, a particular procedure may be selected, using the resources catalogue, in order to achieve purpose PI, and then executed. While the process of execution is going on, the same procedure may be selected again in order to achieve another purpose P2. We thus have two processes (actions) running in parallel in order to achieve different purposes, yet the same procedure (or program), a relatively permanent resource, controls them.
A clear example of this is a person playing two games of chess simultaneously, and using the same strategy in the two games for at least part of the time. If one of the opponents makes a move requiring that strategy to be abandoned, the process of executing it has to be terminated in one game but not in the other.
The resources catalogue contains the relatively permanent information (modifiable in the light of experience) that this strategy is normally useful in such and such circumstances for achieving certain types of advantages. The process-purpose index, however, relates not the strategy itself, but, for example, two current executions or uses or activations of the strategy, which may have reached different stages of advancement, to two current purposes. Similarly the ability to multiply may be used twice over in evaluating the following expression:
The process-purpose index would also have an important place in planning activities, when instead of real executions of strategies the index would contain pointers to representations of possible executions of strategies.
Once the importance and ubiquity of such structures in a complex goal-directed information processing system has been understood, the distinction sometimes made between two kinds of memory -- short-term and long-term -- evaporates, for instance, in connection with a plan carried out over a period of several years.
Note added April 2004
That point was very badly expressed, or just wrong. What I think I was
trying to say in 1978 is that there are different memories with
different time-scales and different functions, and assuming there are
only two kinds, short-term and long-term memory, as some people appeared
to claim in those days, was a mistake.
The full details of these temporary structures need not be globally accessible in the same way as some of the previous a structures. That is to say, they are private to the particular processes which use them. It may be, however, that certain local computations, are automatically reported up to a global level whenever they occur, such as estimates of time or computing space, or other resources needed for a process to be completed. This might be done by monitors, described below.
Some of the temporary workspace may be outside the system, for instance a shopping list, an artist's rough sketches, an engineer's calculations. Even a half-completed object is an extension of short term memory for the constructor.
Note added April 2004
Examples would be a partially completed painting, a partial mathematical
proof, a half-built shelter. A similar point was made long ago by
Herbert Simon in connection with insects that produce the next step in a
complex task by reacting to the current state of the environment in
which they are building something. This notion is often referred to as
'stigmergy' and the phenomenon was known to entomologists in the 1950s.
Good ideas are often re-discovered.
These more or less temporary structures are of no use to an intelligent system unless mechanisms are available which can bring about the sorts of processes already hinted at and elaborated below. Typical mechanisms would be procedures for accessing, using, and modifying the resources, catalogues, plans, etc. The whole system needs some kind of overall control. This is the business of a central administrative process, for which computer operating systems provide a very first (very rough) approximation.
[[Paragraph added in 1986:
To some extent, parts of the system may (and will) work autonomously in parallel, e.g. posture control, control of breathing, and control of saccadic eye movements. However, since two or more needs may require incompatible actions, and since coordinating two actions rather than performing them separately may improve overall performance, it may be useful for some 'central' system to resolve conflicts and co-ordinate decisions. A 'central administrative process' may have this role. ]]
The central administrative process will at various times survey the motivational base and purpose-process index and select from the unfulfilled purposes a subset for generating further planning and action. This selection may be driven partly by previously selected purposes or principles, and may use current information, such as estimates of likelihood of success or failure, knowledge about opportunities and resources available now and in the future, the current state of other actions, and so on.
Sometimes no selection can be made until a change has been made in the set of purposes for instance by inventing a compromise between two conflicting purposes. In at least some cases, the selection must be automatic to avoid an infinite regress of decision making.
Similarly, after certain motives or purposes have been selected for action, then in at least some cases they must invoke suitable action-generating procedures automatically, since if everything required prior deliberation or planning, nothing could ever get started. This automatic activation can happen when a current purpose closely matches a catalogued specification of a typical use of an available procedure. Monitors would be employed to reduce the risks inherent in some automatic activation.
When no matching procedure is found for a certain purpose P, in the resources catalogue, it may be possible instead to find a match for the new purpose of making a plan for achieving P. For instance, if the purpose 'Go to Liverpool' fails to match any current plan, then 'Make a plan for going to Liverpool' may match a typical use (that is, making plans for going places) of a procedure for constructing routes (for example, find an atlas containing both your current position and the destination, then . . . etc.). Again, even if one does not yet know a procedure for making objects of a certain type, one may have a procedure for constructing a suitable procedure for making those objects, by analysing specifications of a required object and available tools and materials.
In short, when first-order matching fails, second-order matching may succeed. Perhaps in some cases even higher-order matching (make a plan for making a plan for achieving P) may succeed.
Similarly, if several procedures are found to match the purpose, then a new purpose may have to be set up, namely the purpose of choosing between the available alternatives. If a choice cannot be made using the information in the resources catalogue, it may be necessary to try out some of the alternatives. (See chapter 8 for more on the difference between examining and executing procedures.) This kind of comparison of alternatives may occur at various stages in the construction of one plan, contrary to the games-theoretic analysis of human decision-making which assumes that we always choose between complete alternatives, without saying anything about how we construct those alternatives.
When the administrator has failed to find or produce a plan for a certain purpose, a second-order task may have to be added to the motivational base as a new unachieved purpose (i.e. finding a plan), to be attended to later if anything relevant crops up, in ways described below. (This can produce accidental learning.) Alternatively, if the original purpose was very urgent, or there is nothing else to do at the time, then trial and error with back-tracking may be used.
Note added April 2004
Some of the above ideas later turned up in SOAR, a problem solving and
learning system developed in the early 1980s. SOAR detects when
an impasse occurs and switches to a new task, resolving the
impasse. However, as far as I know, SOAR did not include the
option in the previous paragraph, namely deferring the process of
dealing with an impasse until some unspecified future date. I believe
SOAR also did not include the point in the next paragraph about checking
unexpected benefits and side-effects of proposed new solutions, which
became an important feature of many planning systems apparently inspired
by Sussman's HACKER (Sussman 1975) referred to above.
For more on
the ideas in SOAR see
Newell, A. (1980b). Reasoning, problem solving and decision processes: The problem space as a fundamental category. In R. Nickerson (Ed.), Attention and Performance VIII. Hillsdale, NJ: Erlbaum.
Should a suitable procedure for achieving P be found or constructed, in any of the above ways, then analysis of its typical uses and effects (recorded in the resources catalogue), or analysis of its structure, may show that it, or a modified version, will enable more than one current purpose or motive to be fulfilled. If several suitable alternatives are available, this analysis may provide a reason for choosing between them. Or it may show that the procedure would interfere with some other current purpose. A process (action, procedure-execution) is then generated by executing the selected procedure with suitable arguments or parameters bound to its variables (for example, 'destination' might be bound to 'Liverpool' in the previous example).
The central administrator (and perhaps also some of the other currently running programs) must be able to interrupt, terminate, modify, or restart current processes (though some may be less controllable than others, for instance if they are so well-tried that possible interrupt points have been kept to a minimum). These control decisions will be taken on the basis of new information from monitors (described below), using the purpose-process index as described above. So the index must be changed every time a process is begun, modified, halted, or found to be capable of serving an unexpected purpose as a side effect, as well as when ongoing processes set up new sub-goals and generate corresponding sub-processes.
Some processes which include complex sets of sub-processes, may have to have their own private purpose-process indexes in their private work spaces (see p. 124), as well as being more briefly represented in the main index. They may also have their own central administrators!
Such perceptual procedures may involve computations of arbitrary complexity, using a great deal of background knowledge, like the perceptual procedures involved in a medical diagnosis or the tuning of a car engine. Even ordinary perception of simple shapes and familiar physical objects can be shown to presuppose considerable factual and procedural knowledge. This is why perception cannot be separated from cognition. See chapter 9 for more details.
So the system needs a collection of perceptual procedures, for analysing and interpreting various kinds of structures in various kinds of contexts. The limits of these procedures together with the limits of the sense-organs and the current store of information about the environment will define what the system is capable of perceiving. Systems with the same physical sense organs may therefore have quite different perceptual abilities as we know from variations in human perception. Thus there cannot be any such thing as perceiving things 'directly' or 'as they are in themselves'. As Max Clowes once put it: ''We inhabit our data-structures''. The same must be true of intelligent machines. So the objective/subjective distinction evaporates. (Compare Boden, 1977.)
The range of types of objects, properties and relationships that human perceptual procedures are capable of coping with is enormous. So in a sensible system they will not all be applied to every possible chunk of sensory input or meaningful structure. For instance, when you last read a page of typescript you probably did not use your ability to notice that the letters on the page were in vertical columns as well as horizontal rows; and while listening to someone talking one language you know, you do not apply the analysis procedures which would enable you to recognise in his syllable-stream the sounds of words of another language you know. Did you notice the let' in letters' or 'horizon' in 'horizontal' above? If every available analytical and interpretive procedure were applied, their outputs would form an enormous information store, and the system would then have the problem of perceiving its contents in order to make use of the information.
It seems not only sensible, but also to correspond to human experience, to have only a small selection of available perceptual programs running at any time in relation to any one piece of 'perceivable' structure, such as the structures mentioned in the previous sections or those produced by sense-organs. There are serious problems in explaining how appropriate programs are selected.
The active analysis programs may be called 'monitors'[note 6.2] and it seems to be necessary to have two main kinds of monitoring general purpose and special purpose. The former involves frequent and large-scale application of relatively simple analyses and tests which have a good chance of being relevant to a wide range of purposes and circumstances. (Is anyone calling out my name? Is something on my retina moving?) The special purpose monitors may be more complex, and will be set up only when there is a specific reason to expect that they will find something or that if they find something it will be very useful in relation to current motives.
In either case the monitor need not itself complete the analysis and interpretation of new information. Instead, what it finds may act as a cue (or reminder, or stimulus) which will invoke (e.g. via a catalogue or index of resources) more complex object-specific or problem-specific procedures.
For instance, if the environment is a spatial domain, then a visual retina might be designed with very many relatively simple general purpose monitoring procedures 'wired into the hardware', for efficiency, instead of being expressed as programs. So the retina might be divided into many small regions, each being constantly monitored to see whether any change has occurred in some physically detectable property (brightness, colour, graininess of texture). If a change is noted, the monitor sends an interrupt signal to inform processes which may need the information. Other general purpose monitors might be constantly monitoring these monitors to see whether something which has consistently been reporting changes stops doing so. There may be general purpose monitors not only at the interface with the physical environment, but also at several other interfaces. Perhaps every time one of the globally accessible structures (such as the motivational base or process-purpose index) is accessed or modified by any current process, a general purpose monitor will note this and send an appropriate signal or take appropriate action (such as recording the fact for future reference). In recently developed programming languages this is achieved by 'pattern-directed procedure activations'. It is also a common feature of computer operating systems, for example, to prevent unauthorised access to information.
A very useful general purpose monitor would be one on the lookout for 'I've been here before' situations: this might enable loops, infinite regresses, and unnecessarily circuitous procedures to be detected. However, the concept of 'the same state as before' admits such varied instantiations that it cannot be tested for in general by any one procedure. General tests might therefore have to be restricted to a few possibilities, like a return to the same geographical location, much more specialised monitors being required if other kinds of repetition are to be detected another source of fallibility in complex systems.
This will not work if records of previous states are not retained. Alas, people do not remember everything, not even their own actions. Repetitions often go undetected, like recounting their exploits or telling you a joke for the nth time. However, we shall see below that remembering apparently useless things may be an essential pre-requisite for certain kinds of intelligent behaviour and learning.
A 'found something' signal from a general purpose monitor may function simply as an invitation to some other program or monitor to look more closely, applying special purpose perceptual procedures to see if the occurrence is important to current motives or processes. Depending on what else is going on, the invitation may be ignored, or the new information may simply be stored, without further analysis, in case it will be useful later on. (Note that this presupposes some indexing procedure.)
Special purpose monitors may be much more complex, may have a much more transient existence, and may be set up at all levels of complexity in the system. For instance, in dealing with someone we know to be 'difficult' we need to be on the look-out for danger-signals in their behaviour. And while searching for a proof of some mathematical formula, one may have good reason to suppose that if certain sorts of intermediate results turn up in one's calculations they will enable an easy proof to be found, whereas if others turn up they will show that the formula was not provable after all. In that case one could set up monitors to be constantly on the lockout for the 'accidental' production of such results. (For examples, see Wertheimer Productive Thinking.) The tests for the occurrence of such special cases need not be at all trivial, and it may be necessary to make inferences from obscure cues, learnt in the course of considerable previous experience.
Watching out for multiplication or division by zero when simplifying equations illustrates this: zero may be heavily disguised in an expression like:
Normally the 'something found' signal from a special purpose monitor would be less likely to be ignored than signals from general purpose monitors, partly because the latter will always be crying 'wolf' and partly because the setting up of a specialised monitor will reflect the importance of its results, for current purposes.
Discoveries of the analytical and interpretative programs constituting monitors may be added (perhaps after some filtering by intermediate monitors) to the belief system (see section 6.6.(b)), forming a record of events and discoveries. At this stage a particularly important general purpose monitor should be available to try matching each addition to the belief system against currently unfulfilled purposes, or at least a 'high priority' subset of current motives, to see whether the new information satisfies or obstructs any of them. For example, the newly discovered fact or technique may be a solution to a problem you were thinking about yesterday. If it is a general purpose monitor it will have to use crude matching techniques, so some relevant relationships will be missed unless specialised monitors are set up. (Again, we see how fallibility is a necessary consequence of complexity.)
Not every piece of new information can be stored permanently. The problems of indexing, shortage of space, searching for what is relevant etc., would make this unworkable. But it may be possible to store information for a short time in case it turns out to be relevant to some process or purpose other than that which generated it. This will be most useful in the case of 'raw' data acquired for one purpose but potentially useful for others. If only the interpretation of such data is stored, then useful information may be lost. So besides the interpretation made for one purpose it may be useful also to store, at least temporarily, the original uninterpreted information in case it turns out to be relevant to other purposes. It must therefore be stored in a globally accessible structure.
In order to be really flexible and creative, the system will have to be able to activate specialised monitors, from time to time, which ask the following questions about new items of
information as they turn up:
Although such questions may occasionally be answered by a simple match between a current purpose and new information, at other times the full problem-solving power of the system may be needed in order to detect the relevance of a new fact, another example of the recursive, or non-hierarchic, nature of computational systems. For instance, a stored resource may not be found by a straightforward search in the resources catalogue. However, some further analysis of what is needed may solve the problem of where to search. Alternatively, it may later be found to be related to a current problem only when, by chance, it is turned up as a result of a search generated by some other need, and a monitor, or the central administrator, causes its relevance to the earlier purpose to be investigated. The person who is looking for both a screwdriver and eating utensils may be more likely to recognise the knife on the table as a potential screwdriver than the person who is simply looking for a screwdriver. But he must also be able to relate the structure of the knife to the function of a screwdriver.
For instance, examination of a series of failures over a long period of time may suggest a generalisation about what caused them, leading to a modification of some old procedures. (Of course, some people never learn from their failures, especially their failures in dealing with other people. Why not?)
Similarly, if successes are sometimes achieved unexpectedly, the system should go back and try to find out whether enough information was previously available for the bonus to have been predicted and planned for, in which case some existing planning procedures will again need to be modified.
Records of events must also be searched for refutations of previously accepted generalisations, and for new patterns suggesting deeper explanations of previously known phenomena. In some cases, retrospective analysis of difficulties in getting at relevant stored resources may show the need for reorganisation of the catalogue of resources or the index to information. Thus, all sorts of comparisons need to be constantly going on, relating new information, old information, current motives, and possible future motives.
Once again, retrospective analysis cannot be done simply by a general purpose program, if it is to be at all deep. There must be a preliminary general analysis of unsolved problems to suggest that certain particular types of questions need to be investigated, and appropriate special purpose investigation procedures invoked or constructed.
Normally many questions like 'Was my failure due to bad luck or was there something wrong with the procedure by which I worked out a strategy?' will remain unanswered. Unanswered questions can be added to the store of unfulfilled purposes, thereby enlarging the motivational base and possibly influencing the course of events later on, if for instance, one of these problems turns out accidentally to match some information generated by another purpose.
Moreover, these unsolved problems may themselves generate new processes of experimentation or exploration, for instance in order to test some tentative hypothesis about the scope of a regularity or the explanation of a surprise. Without a major driving force provided by the need to answer questions and solve problems, it is hard to see how human infants could possibly learn as much as they do in the first years of life. It is paradoxical that the words 'play' and 'toy' are often used to denote this most important of all human activities and its instruments. It is also worth noting that unless the system in some way consciously or unconsciously distinguishes errors in its own procedures from failures due to the environment, it cannot modify its procedures and learn. Thus even new-born infants and any organism that learns, must have a rudimentary concept of 'self', contrary to popular opinion.
No simple and uniform notation can be expected to work for all cases: sometimes a desired object may have to be represented in terms of a function that it can fulfil, sometimes in terms of a verbal description of its structure, sometimes in terms of a procedure for constructing it, and sometimes in terms of a template or model, similar in structure to it. Usually a combination of representations will be needed.
A language which is suitable for formulating a procedure (or program) so that it can be executed efficiently need not be equally good for constructing the procedure in the first place nor for describing how that procedure works so that its uses and limitations can be understood. The system may have to use one language while a procedure is constructed and debugged, after which it is translated (that is, compiled) into some less accessible, less intelligible, less easily modified but more efficiently executed form.
Programming such a system would be an enormous task, yet it seems that existing expertise makes it all possible in principle. For instance there are complex operating systems which permit several different processes to run on a single computer, as if in parallel (because small chunks of each are run in turn), interacting with each other as they go, and this would enable several monitoring programs and administrative programs to run at the same time as programs for planning, executing actions and retrospective analysis. The POPEYE perception project, described in a later chapter, illustrates the possibility of such parallelism, in a simple form.
Parallel processors might be of use only for relatively simple, general purpose monitoring of the kinds already described, such as the monitoring of a retinal array for simple events, and perhaps the monitoring of stored symbols for crude and obvious matches with widely broadcast current requirements.
Since all this can in any case be simulated on a single, serial processor, the distinction between serial and parallel physical processors has not much theoretical significance for our purposes. This is not to deny that parallel processing (which can in principle occur on a serial processor) is crucial for the kinds of interactions between processes described above.
[[Note added 2001.
This point became clearer in the 1990s and beyond when AI researchers saw the importance of architectures for complete systems, instead of concentrating only on representations and algorithms. See my 1992 review of Penrose: A. Sloman, 'The emperor's real mind', Review of Roger Penrose's The Emperor's new Mind: Concerning Computers Minds and the Laws of Physics, in Artificial Intelligence, 56, pp. 355--396, (available at the CogAff web site ]]
How wide a range of possibilities existed in such a system at any time would depend on such things as how wide a range of resources were stored in it, how complete the catalogues of resources and beliefs were, and what kinds of matching mechanisms were available. These, in turn, would depend, as in the case of a human being, on how much and what kind of previous learning had occurred. A mind contains a culture. So anthropology must be part of psychology, and vice versa.
Nobody could hope to design a complete adult humanoid robot. At best, it may be possible to produce a baby mind with the ability to absorb a culture through years of interaction with others.
If four arrows lead from box A to other boxes, this implies that there are four possible states, or sub-processes, which can occur after the one represented by A. Which one does occur will, normally, depend on what sorts of things occur in phase A, which in turn may depend on features of the context, including a long history of previous processes. (The normal approach in social science and much psychology is to shirk the task of understanding such dependencies, by representing the transitions as probabilistic, and studying the frequencies with which they occur in a large sample examined superficially, instead of studying particular transitions in depth.)
Each box represents a state of the whole system, so a flowchart of this sort should not be confused with a chart in which the boxes represent mechanisms of some kind and the arrows indicate flow of something like energy or information. Mechanisms are not states or phases, and flow between parts is not the same thing as transition between states of the system. A flow chart is not an architecture diagram (though it may imply some architectural features.)
The chart summarising many examples of familiar kinds of human behaviour, follows. (In the original book it was on pages 138-9).
This kind of flow-chart can be misleading in various ways.
The plans, or procedures, which generate uninterrupted executive processes may themselves have been stored or constructed only after previous processes of deliberation, involving many circuits round the anticlockwise loops. Even when things do not go wrong, there is always the possibility of dealing with difficulties and surprises, represented by the arrows going from left to right.
A system in which everything always worked exactly as described above would be much more efficient and rational than a human being. Nevertheless we know that human beings are often capable of doing the kinds of things the system can do, such as noticing unexpected obstacles and changing plans. The system does not therefore explain what people actually do; rather it generates, and thereby explains, a framework of possibilities which, for various reasons, may often not be actualised even though they would be appropriate, as in failure to recall a well-known fact or name. For reasons already mentioned, even a computing system of this kind must be fallible when it is very large.
A spell in a peculiar environment may cause procedures and beliefs to be constructed which interfere with efficient functioning in other environments, and may be hard to erase or modify. The mechanisms which manage the purpose-process index may have faults. Monitors may fail to work normally, or else their 'something-found' messages may not reach appropriate destinations. A certain class of records may be intact, but the procedures for interpreting the symbols used may be faulty. Procedures for relating new information to the index of current processes and their purposes may be faulty. Good plans may be constructed, but mechanisms for executing them may be faulty. Alternatively, execution of available plans may proceed faultlessly, but the processes of constructing new plans may fail for one reason or another.
Various sorts of learning catered for in the above scheme may fail to occur. These are very general kinds of pathology. Other more specific kinds would require a quite different analysis.
Clearly, the task of interpreting and diagnosing pathological behaviour in such a complex system must be extremely difficult. It cannot be done without a good theory of the normal structure and functions of the system. This is why I have little faith in current methods of psychotherapy.
In order to demonstrate that this sort of mechanism provides an adequate explanation of the possibilities available to a human being, it is necessary either to analyse the specifications of the mechanisms and of the possibilities to be explained, and then prove mathematically that the mechanism does generate the required range of possibilities and nothing which it should not generate, or else to construct the mechanism and run it experimentally in a wide variety of circumstances to ensure that it produces an adequate variety of behaviour, with the required fine structure.
The former is likely to be well beyond the possibilities of mathematical analysis available in the foreseeable future, even though the mathematical analysis of programs and proof of their correctness is a developing discipline. In particular, it assumes that we can produce complete specifications of the possibilities to be explained, whereas one of the lessons of artificial intelligence is that attempting to design a working system often leads you to revise and extend your specifications. The experimental method may require the development of computers which have much faster processors and larger memories than at present.
Whichever approach is taken, it is necessary to have a good initial specification of the range of human abilities to be explained, and this is best achieved by combining philosophical techniques of conceptual analysis with the methods of social science and psychology.
Since each of the abilities makes use of many others, like a family of mutually recursive computer programs, there is no logical order in which they should be described: no ability is basic to the others. Further, none of them can be described completely without describing many others. This makes the task of constructing such descriptions, difficult, confusing and very frustrating.
The abilities which the above system is required to explain include:
To specify these abilities in detail is to give at least part of an answer to the question: what is a mind? or what is a human mind? The partial answer is of the form: a mind is something which can do such and such sorts of things. To explain these abilities, that is, to explain how a single integrated system can do all these things, is to explain how it is possible for minds to exist. This does not merely make a contribution to the scientific study of man. It also brings many old philosophical discussions about the nature of mind and its relation to the human body several steps forward. (But it need not include anything that Aristotle would have disagreed with.) In the process it is certain that many detailed problems in different branches of philosophy will be solved, rejected as confused, or brought nearer solution. The remaining chapters of this book address a few of the more detailed problems.
[[Note Added 1 May 2004
On re-reading this chapter I have become aware how much of my work over the last few decades has simply been elaboration and in some cases correction of the ideas in this chapter. Even the SimAgent toolkit, developed over the last 10 years to support work on architectures for agents with human-like capabilities has many features whose inclusion can be traced back to some of the requirements described in this chapter. The toolkit is summarised in:http://www.cs.bham.ac.uk/research/poplog/packages/simagent.htmlPapers and slide presentations on architectures are here:
Flow-charts constitute a programming language. My remarks indicate that the language is too limited in expressive power. I never use them in my own programming, and do not teach students to use them, since careful layout in a language like LISP or POP2 (augmented with good iteration constructs) can achieve the same clarity without the same limitations.
[[Note added in 2001:
Two themes which are implicit in this chapter turned out to be important in later work, namely the role of real-time constraints in a fast-moving world, and the potential for a mechanism of the sort described here to get into an emotional state (See: A.Sloman and M.Croucher 'Why Robots Will have Emotions' in IJCAI 1981, available, along with other relevant papers at the cogaff web site.)
The two themes are closely connected. The real-time constraints generate a need for various kinds of interrupt mechanisms, alluded to in this chapter. The potential for interrupts, which can disturb current activity is intimately connected with emotional states (in at least one of the many senses of 'emotional': some authors use the word so loosely as to cover all affective states including motives and attitudes.) These ideas were all developed further in the framework of the CogAff (Cognition and Affect) project, which specified a "schema", the CogAff schema, for a very wide range of designs of architectures, of which a special case was the H-CogAff schema, including multiple concurrently active components developed at different stages in our evolutionary history (and during individual development). For more information see the overview at CogAff web site:
Original pages 144-176
The previous chapter listed varieties of information that must be represented in an intelligent system. Nothing was said about how different types of symbolism could be used for different purposes. This chapter explores some of the issues, relating them to philosophical debates about inference and reasoning.
A. Sloman, (1971) 'Interactions between philosophy and AI: The role of intuition and non-logical reasoning in intelligence', in Proceedings 2nd IJCAI (1971) Reprinted in Artificial Intelligence, vol 2, 3-4, pp 209-225, 1971, and in J.M. Nicholas, ed. Images, Perception, and Knowledge Dordrecht-Holland: Reidel. 1977
Also available online http://www.cs.bham.ac.uk/research/projects/cogaff/62-80.html#1971-02
See notes at end for related papers written later.
It is also relevant to problems about the nature of mathematics and science. For instance, many mathematicians adopt a logicist' position and argue that the only acceptable mathematical proofs are those using the formalisms and inference rules of symbolic logicians. They claim that where diagrams, or intuitively grasped models are used, these are merely of 'psychological' interest, since, although they shed light on how people arrive at valid proofs, the real proofs do not contain such things. According to this viewpoint, the diagrams in Euclid's Elements were strictly irrelevant, and would have been unnecessary had the proofs been properly formulated. (For some counter-arguments, see Mueller, 1969.)
This issue is clearly relevant to teachers of mathematics and science. Teachers who accept the logicist' position will be inclined to discourage the use of diagrams, pictures, analogies, etc., and to encourage the use of logical notations, and proofs which are valid according to the rules of propositional and predicate logic.
Kant's theories were opposed to this logicist position, insofar as he argued that important kinds of mathematical knowledge could be both a priori and synthetic, that is, non-empirical and non-analytic. I think he had an important insight, though it has not been possible until recently to say very clearly what it was. The issues can be clarified by discussing different kinds of symbolisms, or representations, and their roles in various kinds of reasoning. Some irrelevant metaphysical digressions can be avoided by noting that such reasoning can occur in computers, as well as in human minds.
One interpretation of what Kant was trying to say is that we sometimes, for instance in mathematical thinking, use non-verbal 'analogical' representations, and make inferences by manipulating them, instead of always using logic. His claim is that these non-logical (but not illogical) modes of thinking may be valid sources of knowledge.
This topic is closely related to current problems in artificial intelligence, for it turns out that different forms of representation may differ greatly in their computational properties.
In particular, methods of representation and inference which meet the approval of logicians will not necessarily be the best ones to use in a computer program which is to behave intelligently. Not all workers in A.I. would accept this. For example, and McCarthy and Hayes, (1969) argued that an intelligent computer program will need to be able to prove by methods of logic that a certain strategy will achieve its goal. They claimed that this would be an essential part of the process of decision making. I doubt whether they still hold the same views (see Hayes, 1974), but the position they once advocated is worth refuting even if they have changed their mind, since it is very close to the views of many philosophers, especially philosophers of science.
Shot (Mary, the-brother-of (Tom))where the predicate 'shot' is treated as a two-place function and 'the brother of as a one-place function. Pictures, maps, diagrams, models, and many of the representations used in computer programs are not Fregean. Some of them are 'analogical'.
This contrast between Fregean (or 'applicative') and analogical representations will be more precisely defined later. It is often referred to by people who do not know how to characterise it properly. For instance, it is sometimes assumed that analogical representations are continuous and the others discrete, or that analogical representations are essentially non-verbal (that is, that verbal languages do not use them), or that analogical representations are isomorphic with what they represent. These mistakes (which will be exposed later) also go along with a tendency to assume that digital computers cannot construct or use analogical representations. (See the writings of Pylyshyn.)
Terminology is also often confused. What I have called 'Fregean' or 'applicative' representations are sometimes called 'symbolic', linguistic', 'formal', 'propositional', or 'verbal'.
The word 'symbolic' is unsatisfactory, since the ordinary use of 'symbols', 'symbolism' and 'symbolic' is much more general (for example maps can be said to be symbolic, even though they are analogical). I shall use 'representation' and 'symbol' and their derivatives more or less interchangeably as very general terms, and will refer to any system of representation or symbolism as a language, as in 'the language of maps'. I shall use 'Fregean' and 'applicative' interchangeably.
One of the main aims of this chapter is to show that inferences made by manipulating non-Fregean representations may be perfectly valid. I believe this is at least part of what Kant and Intuitionist mathematicians (for example Brouwer) were trying to say.
Before developing the point in detail, I would like to stress that I am not taking sides in the dispute among psychologists who argue over whether people use 'iconic' forms of memory, and reason with images. I believe that contributions from both sides are often riddled with confusions, related to the mistakes referred to above. It is especially important to notice that the points I make about analogical representations are quite neutral on the question whether such representations occur in the mind or not. Even if they occur only on paper (for example in maps and diagrams) the point is that they can still be used in valid reasoning.
Useful discussion of these issues is impossible without careful definitions of some of the main concepts, such as 'valid', 'inference', logic', 'verbal', 'analogical', 'Fregean' (or ' applicative'). However, before attempting to be more precise, I shall present a few examples of reasoning with non-Fregean symbolisms.
Figure 7.1 Euler circles
Whether a picture correctly depicts the world is, in each case, a contingent question which can only be answered by examining the world; but we can still discover, without examining the world, that certain combinations of correctness and incorrectness are necessarily ruled out. For example, no matter how things are in the world, we can use our understanding of the methods of representation employed in such diagrams to discover that it is impossible for (a) and (b) correctly to represent how things are, while (c) does not, given the stated interpretations of the diagrams. This has to do with the impossibility of creating a diagram containing (a) and (b) simultaneously, without the relation (c). How we discover this is not obvious, but that we can is.
We are also able to use our understanding of the syntax and semantics of English to tell that the following argument is valid:
All the people in the room are students.In both the verbal and the diagrammatic representation there are problems about possible ambiguities of reference or meaning. In both cases it is hard for people to explain why the inferences are valid. Nevertheless, we can tell that they are, and the study of such reasoning has occupied great logicians since Aristotle, leading to many logical symbolisms designed to capture the essential form of a variety of inferences.
No students are taxpayers.
Therefore: No people in the room are taxpayers.
It is worth remarking that when Euler's circles are used for this kind of reasoning, the three diagrams of figure 7.1 are normally superimposed in one diagram. This makes it harder to perceive that a method of reasoning from 'premisses' to a 'conclusion' is involved. By contrast, in verbal arguments the premisses and conclusion normally have to be formulated separately. In some of the examples which follow, I shall collapse the different representations involved into one diagram or picture, in the usual way.
Here are some more examples. In figure 7.2 the horizontal straight line is to be interpreted as representing a rigid straight rod, pivoted at the middle on a fixed support. In figure 7.3 each circle represents a rigid wheel free to rotate about a fixed axle passing through its centre, and contact between circles represents contact without slipping between the wheels.
Figure 7.2 Lever
Figure 7.3: Wheels with friction.
In both figure 7.2 and figure 7.3 the arrows represent direction of motion, (of what? how can you tell?), so the figures represent changing configurations. However, the arrows labelled (a) are to be interpreted as assumptions, or premisses, and the arrows labelled (b) are to be interpreted as conclusions, inferred from the rest of the picture. In both cases, we can consider a bit of the world depicted by the diagram and ask whether the arrow (a) correctly represents what is happening, and whether arrow (b) correctly represents what is happening. In each case, it is a contingent matter, so empirical investigation is, required to find out whether the representation is correct. (Just as empirical investigation may be used to check the truth of premisses and conclusion in a logical argument.)
However, we can tell non-empirically that it is impossible for arrow (b) to be an incorrect representation while arrow (a) and the rest of the diagram represents the situation correctly given the specified interpretations of the arrows, and other features of the pictures. So we can say that the inferences from (a), and the rest of the picture, to (b) is valid, in both figure 7.2 and figure 7.3. Both examples could have been replaced by two separate pictures, one containing only arrow (a) and one containing arrow (b), as in figure 7.1.
Far more complex examples of inferences about mechanical systems, using diagrams could be given. Figure 7.4 is relatively simple. In figures 7.4 and 7.5, horizontal lines again represent rigid levers pivoted at the points indicated by small triangles. The circles represent pulleys free to rotate about their centres, but not free to move up or down or sideways.
Figure 7.4 Levers and Pulleys
The vertical lines, apart from arrows, represent inelastic flexible strings, and where two such lines meet a pulley on either side, this represents a string going round the pulley. Where a vertical line meets a horizontal line, this represents a string tied to a lever. As before, the arrows represent motion of the objects depicted by neighbouring picture elements. Once again, we can see that what is represented by the arrow marked (b) can be validly inferred from what is represented by the arrow marked (a) and the rest of the picture.
Where the inference is more complicated, some people may find it harder to discern the validity. In the case of logical or verbal inferences, this difficulty is dealt with by presenting a proof, in which the argument is broken down into a series of smaller, easier arguments. Something similar can be done with an argument using a diagram.
Figure 7.5 Levers and Pulleys with added arrows.
For example, figure 7.5 is just like figure 7.4, except for additional arrows. The arrows marked (c), (d), (e), (f) and (g) can be taken as representing intermediate conclusions, where each can be validly inferred from the preceding one, and (c) can be inferred from (a), and (b) from (g). Using the transitivity of valid implication, we see that (b) is validly inferrable from (a). Notice that it is not always immediately obvious what can and what cannot be validly inferred. For instance, if the length of an arrow represents speed of motion, do the inferences remain valid?
It is possible to give a computer program the ability to reason about mechanics problems with the aid of such diagrams. To do so would require us to formulate quite precise specifications of the significant properties and relations in the diagrams, and the rules for interpreting them, so that the computer could use these rules to check the validity of the inferences. Funt (1976) has done this in a program which makes inferences about falling, sliding and rotating objects.
I have experimented with similar programs. Making a program solve problems intelligently would involve giving it procedures for searching for significant paths through such diagrams, analogous to the path represented by the arrows (c) to (g), indicating a chain of causal connections relating (a) and (b). Finding relevant paths in complex configurations would require a lot of expertise, of the sort people build up only after a lot of experience. Giving a computer the ability to acquire such expertise from experience would be a major research project given the current state of artificial intelligence. (At the time of writing a group at Edinburgh University, directed by Alan Bundy, is attempting to give a computer the ability to reason about simple mechanical problems described in English.)
I believe that our concept of a causal connection is intimately bound up with our ability to use analogical representations of physical structures and processes. This point is completely missed by those who accept David Hume's analysis of the concept of 'cause', which is, roughly, that 'A causes B' means 'A and B are instances of types of events such that it has always been found that events of the first type are followed by events of the second type'. His analysis explicitly rejects the idea that it makes sense to talk of some kind of 'inner connection' between a cause and its effect. I suspect that we talk of causes where we believe there is a representation of the process which enables the effect to be inferred from the cause using the relations in the representation. The representation need not be anything like a verbal generalisation. However, analysis of the concept 'cause' is not my current task, so I shall not pursue this here.
So far my examples of valid reasoning with analogical representations have all used diagrams. It does not matter whether the diagrams are drawn on paper, or on a blackboard, or merely imagined. Neither does it matter whether they are drawn with great precision: detailed pictorial accuracy is not necessary for the validity of examples like figure 7.4. It is also worth noting that instead of looking at diagrams (real or imagined), we can sometimes do this kind of reasoning while looking at the physical mechanism itself: the mechanism can function as a representation of itself, to be manipulated by attaching real or imaginary arrows, or other labels, to its parts.
So by looking at a configuration of levers, ropes and pulleys, and finding a suitable chain of potential influences in it, we can draw conclusions about the direction of motion of one part if another part is moved.
It is so easy for us to do this sort of thing, for example when we 'see' how a window catch or other simple mechanism works, that we fail to appreciate the great difficulty in explaining exactly how we do it. It requires, among other things, the ability to analyse parts of a complex configuration in such a way as to reveal the 'potential for change' in the configuration. We probably rely on the (unconscious) manipulation of analogical representations, using only procedures which implicitly represent our knowledge of the form of the world. This point is closely bound up with the issues discussed in the chapter on the aims of science, where science was characterised as a study of possibilities and their explanations.
Figure 7.6 Map
Figure 7.6 gives a very simple illustration of the use of a map to make a valid inference. It is instructive in that it also shows a relationship between two representations of different sorts.
In (a) we have a map showing a few towns, marked by dots, with the usual indication of compass points. In (b) we have, not a map, but a representation of the direction (and perhaps distance) between two towns. The arrow represents a vector. Once again we can say that (b) may be validly inferred from (a), though now we have to qualify this by saying that the inference is valid only within certain limits of accuracy.
Many different uses of maps are possible. For instance, from a map showing which crops are grown in different parts of a country, and a map showing the altitude of different parts of the country, we can 'infer' a map showing which regions are both corn-producing and more than 100 feet above sea level.
When planning the layout of a room it may be useful to draw diagrams or to make flat movable cardboard cut-outs representing the objects in the room, and to use them to make inferences about the consequences of placing certain objects in certain locations. This has much in common with the use of maps.
This sort of example shows how a representation may be used to reason about what sorts of things are possible. For example, a particular arrangement of the bits of cardboard can be used to show that a certain arrangement of the objects in a room is possible. This is like the use of diagrams in chemistry to show that starting from certain molecules (for example H-H and H-H and O=O), it is possible to derive new molecules by rearranging the parts (giving H-O-H and H-O-H).
An important step in mastering arithmetic and its applications is grasping that number names themselves can be used in place of dots or fingers (that is, 'one two three' followed by 'one two', matches 'one two three four five).
Figure 7.7 Numbers added
The diagram in Figure 7.7 can be used as a proof that Three plus Two is Five.
What is the largest possible number of persons who might have been parents of great-grandmothers of yours? What relation to you is your son's daughter's first-cousin? There are various ways you might attempt to answer this sort of question, but one of them involves drawing a fragment of a 'family tree', or possibly several family trees consistent with the problem specification. A family tree diagram is an analogical representation of a bit of the social world. Another example of an analogical representation of a rather abstract set of relationships is a chart indicating which procedures call which others in a computer program. Flow charts give analogical representations of possible processes which can occur when procedures are executed. Both sorts of diagrams can be used for making inferences about what will happen when a program is executed, or when part of a program is altered. A morse code signal is an analogical representation of a sequence of letters.
Operations on the array, such as examining a set of points which lie on a straight line', or possibly marking such a set of points, make use of the fact that there is a structural relationship between the array and the retinal image. Similarly, when processing of such an image has produced evidence for a collection of lines, forming a network, as in a line drawing of a cube, then it is convenient to build up data-structures in the computer which are linked together so as to form a network of the same structure. A similar network, or possibly even the same one, can then be used to represent the three-dimensional configuration of visible edges of surfaces in the object depicted by the line-image.
Manipulations of these networks (for example attaching labels to nodes or arcs on the network, or growing new networks to represent the 'invisible' part of the object depicted) can be viewed as processes of inference-making and problem solving, with the aid of analogical representations. It may be that something similar happens when people make sense of their visual experiences. (For more on this see the chapter on perception and Clowes, 1971, Waltz, 1975, Winston, 1975, Boden, 1977.)
Although we are still very unclear about what goes on in the minds of people, we can understand what goes on in the mind of a computer when it is building arrays or networks of symbols and manipulating them in solving some problem. By exploring such programming techniques we may hope to get a much better understanding of the sorts of theories which could account for human imaginative exercises. Our main lack at present is not data so much as ideas on how to build suitable theories.
The illustrations in the preceding sections should give at least a rough idea of what I mean by saying that sometimes valid reasoning may be done by manipulating analogical representations. Many more examples could be given. It is time now to try to formulate more precise definitions of some of the concepts used.
In other words, by examining verification procedures, instead of applying them, we can discover that certain combinations of truth-values of statements cannot occur, no matter what the world is like. 'London is larger than Liverpool' and 'Liverpool is larger than London' cannot both be true: they are contraries. We can discover this by examining the semantics of larger than'. (How is this possible?)
There are many other relationships of truth-values which can be discovered by this kind of non-empirical investigation. For instance, two statements may be incapable of both being false, in which case they are called subcontraries by logicians.
Validity of an inference is a special case of this. Namely, the inference from PI, P2... Pn to the conclusion C is valid if and only if relationships between the statements constrain their truth-values so that it is impossible for all the premisses to be true and the conclusion false. So validity of an inference is simply a special case of the general concept of a constraint on possible sets of truth-values, namely the case where the combination
(T, T,. .. T: F)cannot occur. So validity is a semantic notion, concerning meaning, reference, and truth or falsity, not a syntactic notion, as is sometimes supposed by logicians. They are led to this mistake by the fact that it is possible to devise syntactic tests for validity of some inferences, and indeed the search for good syntactic criteria for validity has been going on at least since the time of Aristotle,
It is an important fact about many, or perhaps all, natural languages, that syntactic criteria for some cases of validity can be found. For, by learning to use such criteria, we can avoid more elaborate investigations of the semantics of the statements involved in an inference, when we need to decide whether the inference is valid. The syntactic tests give us short-cuts, but have to be used with caution in connection with natural languages. It is not always noticed that our ability to discern the correctness of these tests depends on a prior grasp of the semantics of key words, like 'all', 'not', 'some', 'if and others, and also a grasp of the semantic role of syntactic constructions using these words. It is still an open question how ordinary people, who have not learnt logic, do grasp the meanings of these words, and how they use their understanding in assessing validity of inferences. (For further discussion see my 'Explaining logical necessity'.)
We have seen from some of the examples of the use of analogical representations, for example, figure 7.1 and figure 7.2, that the question whether a particular picture, diagram or other representation correctly represents or 'denotes' a bit of the world is in general an empirical question, which involves using the appropriate interpretation rules to relate the representation and the bit of the world. (Similarly, the truth of what a sentence says is, in general, an empirical question.) We have also seen that it is sometimes possible to discover non-empirically, that is, without examining the world, that if one diagram represents a situation correctly then another must do so too. So we can easily generalise our definition of 'valid' thus:
The inference from representations R1, R2, . . . Rn to the representation Rc is valid, given a specified set of interpretation rules for those representations, if it is impossible for R1, R2 . . . Rn all to be interpreted as representing an object or situation correctly (i.e. according to the rules) without Rc also representing it correctly.
In this case we can say that Rc is jointly entailed by the other representations.
This definition copes straightforwardly with cases like figure 7.1, where there are separate representations for premisses and conclusion. The other examples need to be dealt with in the obvious way by treating the single diagram as if it were a compound of two or more diagrams. For example, in figure 7.2 we can say that there is a 'premiss' which is the diagram with arrow (a) but not arrow (b), and a 'conclusion' which is the diagram with arrow (b) but not arrow (a).
Someone who actually uses a picture or diagram to reason with may modify it in the course of his reasoning, and in that case there are really several different diagrams, corresponding to the different stages in the reasoning process.
Explicitly formulating the semantic rules which justify the inference from a set of 'premiss' representations to a ' conclusion' representation, is generally quite hard. We do not normally know what rules we are using to interpret the representations we employ. Many workers in artificial intelligence have found this when attempting to write programs to analyse and interpret pictures or drawings. But the same is also true of the semantic rules of natural languages: it is hard to articulate the rules and still harder to articulate their role in justifying certain forms of inference.
In the case of artificial languages invented by logicians and mathematicians, it is possible to formulate the semantic rules, and to use them to prove the validity of some inferences expressible in the languages. In propositional logic, symbols for conjunction '&', disjunction V', and negation '~' are often defined in terms of 'truth-tables', and by using a truth-table analysis one can demonstrate the validity of inferences using these symbols. It is easy to show, for example, that inferences of the following form are valid:
P v QSimilarly, in predicate logic the quantifiers ('for all x', 'for some x') may be explicitly defined by specifying certain rules of inference to which they are to conform, like the rule of 'universal instantiation' (see Copi, chapter 10). It is not nearly so easy to formulate semantic rules for words in natural languages. In fact, for some words the task would require much more than the resources of linguistics and philosophy. The semantics of colour words ('red', ' vermilion', etc.) cannot be properly specified without reference to the psychology and physiology of colour vision, for example. The principles by which we interpret pictures, diagrams and visual images may be just as hard to discover and formulate.
(See for example Copi, Introduction to Logic, chapter 8.)
If the semantic or interpretative rules for a language or representational system have been articulated, it becomes possible to accompany an inference using that language with a commentary indicating why various steps are valid. A proof with such a commentary may be said to be not only valid, but also rigorous. So far relatively few systems are sufficiently well understood for us to be able to formulate proofs or inferences which are rigorous in this sense. Most of the forms of reasoning which we use in our thinking and communicating are not rigorous.
However, the fact that we cannot give the kind of explanatory commentary which would make our inferences rigorous does not imply that they are not valid. They may be perfectly valid in the sense which I have defined. Moreover, we may know that they are valid even if we cannot articulate the reasons.
This is not to suggest that there are some inherently mysterious and inexplicable processes in our thinking. I am only saying that so far it has proved too difficult for us.
The use of representations to explain or demonstrate possibilities is not directly covered by the preceding discussion. However, all such cases seem to fit the following schema:
Suppose R is a representation depicting or denoting W, where W is an object, situation or process known to be possible in the world.
And suppose that Tr is a type of transformation of representations which is known (or assumed) to correspond to a really possible transformation Tw of things in the world. (See chapter 2 on the aims of science for discussion of 'really possible'.)
Then, by applying Tr to R, to get a new representation, R', which is interpretable as representing an object, situation, or process W' we demonstrate that W' is possible, if the assumptions stated are true.
This seems to account for the chemical example and the use of bits of cardboard to determine a possible layout of objects in a room.
There are many problems left unsolved by all this. For instance, there are problems about the 'scope' of particular forms of inference. Are they always valid, or only in certain conditions? How do we discover the limits of their validity? (See Lakatos, 1976, for some relevant discussion in relation to mathematics, and Toulmin, 1953, for discussions of the use of diagrams in physics.) Does our ability to see the validity of certain inference patterns depend on our using, unconsciously, 'metalanguages' in which we formulate rules and discoveries about the languages and representations we use?
Are children developing such metalanguages at the same time as they develop overt abilities to talk, to draw and interpret pictures, etc.? Questions like these can, or course, be asked about inferences using verbal symbolisms too. (See Fodor, 1976.)
But experience has taught me that readers will project their own presuppositions onto my definitions. So I should like to stress a point which will be repeated later on, namely that there is nothing in the idea of analogical representations which requires them to be continuous (as opposed to discrete). Thus there is nothing to prevent digital computers using analogical representations. A less important source of confusion is the prejudice that analogical representations must be isomorphic with what they represent. This is by no means necessary, and I shall illustrate this with two-dimensional drawings which represent three-dimensional scenes.
The contrast between Fregean and analogical symbolisms is concerned with the ways in which complex symbols work. In both cases complex symbols have parts which are significant, and significant relations between parts. Of course, the parts and relations are not so much determined by the physical nature of the symbol (for instance the ink marks or picture on a piece of paper) as by the way the symbol is analysed and interpreted by users. Only relative to a particular way of using the symbol or representation does it have parts and relations between parts. I shall take this for granted in what follows.
In both Fregean and analogical representations, the interpretation rules are such that what is denoted, or represented, depends not only on the meanings of the parts but also on how they are related. I shall start by saying something about how Fregean symbolisms work. Their essential feature is that all complex symbols are interpreted as representing the application of functions to arguments. Here is a simple example. According to Frege, a phrase like 'the brother of the wife of Tom' should be analysed as having the structure:
the brother of (|)
the wife of (|)
The function 'the wife of is applied to whatever is denoted by 'Tom', producing as value some lady (if Tom is married), and the function 'the brother of is applied to her, to produce its own value (assuming Tom's wife has exactly one brother). Thus the whole expression denotes whatever happens to be the value of the last function applied.
Frege's analysis of the structures and functions of ordinary language was complex and subtle, and I have presented only a tiny fragment of it. For more details see the translations by Geach and Black, and the items by Furth and Dummett in the Bibliography. I shall not attempt to describe further details here, except to point out that he analysed predicates as functions from objects to truth-values, a notion now taken for granted in many programming languages, and he analysed quantifiers ('all', 'some', 'none', etc.) and sentential connectives ('and', 'or', 'not', etc.) also as functions.
For present purposes it will suffice to notice that although the complex Fregean symbol 'the brother of the wife of Tom' has the word Tom' as a part, the thing it denotes (Tom's brother-in-law) does not have Tom as a part. The structure of a complex Fregean symbol bears no relation to the structure of what it denotes, though it can be interpreted as representing the structure of a procedure for identifying what is denoted. In this case, the procedure is first of all to identify whatever is denoted by 'Tom', then use the relation 'wife of' to identify someone else, then use the relation 'brother of' to identify a third object: the final value. (See also my Tarski Frege, and the liar paradox(1971). )
We could express this by saying that sometimes the structure of a Fregean symbol represents the structure of a 'route through the world' to the thing denoted. But this will not fit all cases. For instance, in the arithmetical expression:
3x5 + 4x3
11 - 2
Figure 7.tree A parse tree for an arithmetic expression
(Notice that in interpreting the expression this way we are using a convention about how expressions involving 'x' and '+' should be 'bracketed'.) The tree-structured procedure is executed by working up to the top of the tree from the bottom. Left-right ordering of components does not signify a temporal ordering in which the sums should be done. In some sense we can say that the sub-expressions, for example, 1 I', denote aspects of the procedure. But they do not denote parts of what is denoted by the whole thing. An arithmetical expression denoting the number three may contain a symbol denoting the number eleven, but that does not imply that the number eleven is in any sense part of the number three.
By contrast, analogical representations have parts which denote parts of what they represent. Moreover, some properties of, and relations between, the parts of the representation represent properties of and relations between parts of the thing denoted.
So, unlike a Fregean symbol, an analogical representation has a structure which gives information about the structure of the thing denoted, depicted or represented.
This, then, is my definition of 'analogical'. It is important to note that not ALL the properties and relations in an analogical representation need be significant. For instance, in a diagram the colour of the lines, their thickness, the chemical properties of the paint used, and so on, need not be meaningful. In a map (for instance maps of the London underground railway system) there will often be lines whose precise lengths and orientations do not represent lengths or orientations of things in the world: only topological relations (order and connectivity) are represented. This may be because a map depicting more of the structure of the relevant bit of the world would be less convenient to use. (Why?)
Further, the interpretation rules (semantic rules) need not require that properties and relations within the representation must always represent the same properties and relations of parts of what is represented. The interpretation procedures may be highly context-sensitive. For example, lines of the same length in the scene may be depicted by lines of different lengths in the picture. In figure 7.8 distances, or lengths, in the picture represent distances in the scene in a complex context-sensitive way. Further, lines of the same length in the picture may depict different lengths in the scene. Moreover, the relation 'above', in the picture, may represent the relation 'above', or 'further', or 'nearer', or 'further and higher',
Figure 7.8 Perspective picture
depending on whether bits of floor, wall, or ceiling are involved. This is connected with the fact that parts of an analogical representation may be highly ambiguous if considered in their own right. Only in the context of other parts is the ambiguity removed. Much work in computer vision is concerned with the problem of enabling global relations to ' resolve local ambiguities. (See bibliography references to Clowes and Waltz, and chapter 9.)
Figure 8 also brings out clearly the fact that although the structure of an analogical representation is related to the structure of what it represents, there is no requirement that the two be isomorphic. Indeed, they may have very different structures. In particular, Figure 8 is two dimensional but represents a three-dimensional scene, whose structure is therefore very different from that of the picture.
It should be obvious how to apply my definition of 'analogical' to the sorts of pictures and diagrams used earlier to illustrate inferences with analogical representations. However, it turns out that the precise details of how to interpret relations in a diagram are often surprisingly complicated. Trying to program a computer to do the interpreting is perhaps the best way of discovering the rules. Merely writing down theoretical analyses, you are likely to get the rules wrong. Embodying them in a program helps you to discover that they do not work.
This shows that there is no sharp verbal/analogical or verbal/iconic distinction. A particular symbolism may include both Fregean and analogical resources.
In modern programming languages this is very clear, since there is a great deal of the usual function-application syntax often mixed up with conventions that the order in which instructions occur in a program represents the order in which they are to be executed (and doing them in a different order may produce quite different results). So programming languages, like natural languages, are partly Fregean and partly analogical. This is true even of a logic programming language like Prolog.
But the Fregean/analogical distinction does not exhaust the variety of important kinds of symbolising relations. For example, in a program a symbol may occur which is merely a label' its sole function is to make it easy for other parts of the program to refer to this bit, so that it does not depict either a part of something represented by the whole program nor a thing which is the argument to which a function is applied. Elsewhere in the program may be an instruction to jump to the location specified by this label. The occurrence of such 'jump' instructions can badly upset the correspondence between order of instructions in the program and the time order of events in which the instructions are executed, making programs hard to understand and modify.
The kind of self-referring metalinguistic role of labels in a computer program is clearly something different from the kinds of representation I have called Fregean and analogical.
Natural languages also use self-reference, for instance when the expressions 'the former' and 'the latter' direct attention to order of phrases in a text. They have many other devices which do not fit neatly into these two categories. For example, it is not easy to give a Fregean analysis of adverbial phrases ('He came into the room, singing, leaning heavily on a stick, and dragging the sofa behind him'). So I am not claiming that I have given anything like a complete survey of types of representation. I doubt whether such a thing is possible: for one aspect of human creativity is the invention of new sorts of symbolisms.
One conclusion which may be drawn from all this is that neurophysiologists, psychologists, and popular science journalists who take seriously the idea that one half of the human brain deals with verbal skills and the other half with pictorial and other non-verbal skills are simply showing how naive they are about verbal and non-verbal symbolisms. Presumably, when they learn that besides Fregean and analogical symbolisms there are other sorts, they will have to find a way of dividing the brain into more than two major portions. As for how we deal with combined uses of the two sorts of symbolisms, no doubt it will prove necessary to find a bit of the brain whose function is to integrate the other bits! (Programmers know that there need not be a localised bit of the computer which corresponds to sub-abilities of a complex program.)
The implication is that the use of non-logical methods of inference, and the choice of analogical representations is an irrational, or at best non-rational, piece of behaviour. Scientists are behaving rationally only when they perform logical deductions from theories and when they use observation and experiment to discover whether certain sentences express truths or falsehoods.
Against this view I shall argue that it is sometimes quite rational to choose to use an analogical rather than a Fregean method of representation. That is, there are often good reasons for the choice, given the purposes for which representations are used, which include storing information for future use, guiding the search for good solutions to problems, enabling new versions of previously encountered situations to be recognized, and so on. I do not claim that analogical representations are always best.
If one were designing a robot to be a scientist, or more generally to play the role of a person, it would be advisable, for some purposes, to program the robot to store information in an analogical representation, and to perform inferences by manipulating analogical representations. (See Funt 1977 for a description of a program which solves mechanics problems with the aid of analogical representations.) So it is not merely an empirical fact that people do this too. Of course, neither people nor robots could possibly function with only analogical representations. Any intelligent system will have to use a wide variety of different types of representation and different types of reasoning strategies. But how can we decide which ones to use for which purposes? There are no simple answers.
Fregean systems have the great advantage that the
structure (syntax) of the expressive medium does not constrain
the variety of configurations which can be described or
represented. So the same general rules of formation, denotation
and inference can apply to Fregean languages dealing with a
very wide range of domains. The formula
P(a,b,c), or its
English variants, like
'a is P to b and c', can be used for
applying a predicate to three arguments no matter what kind of
predicate it is, nor what sorts of things are referred to by the
argument symbols. The following assertions use the same
Fregean structure despite being concerned with quite
Greater-by (three, twelve, nine)
Joins(coupling, truck 1, truck 2)
Between(London, Brighton, Cambridge)
Contrast the difficulty (or impossibility) of devising a single
two-dimensional analogical system adequate for representing
chemical, musical, social, and mechanical processes. Fregean
systems make it possible to think about very complex states
of affairs involving many different kinds of objects and
relations at once. For each type of property or relation a new
symbol can be introduced as a predicate (that is, a function
which, when applied to objects as arguments, yields the result
TRUE or the result FALSE). The syntax for making assertions
or formulating questions using all these different symbols is
the same. There is no need to invent new arrangements of the
symbols to cope with a new kind of domain.
Greater-by (three, twelve, nine)
Joins(coupling, truck 1, truck 2)
The price of this topic-neutrality, or generality, is that it becomes hard to invent procedures for dealing efficiently with specific problems. Very often, searching for the solution to a problem is a matter of searching for a combination of symbols representing something with desired properties. For instance it may be a search for a plan of action which will achieve some goal, or a search for a representation of an arrangement of objects in a room, or a search for a representation of a route between two places which is shorter than alternative routes. For a frequently encountered class of problems it may be advantageous to use a more specialised representation, richer in problem-solving power than a Fregean symbolism.
What makes one representation better than another? To say that it is easier for humans, or that people are more familiar with it is not to give an explanation. An adequate explanation must analyse the structure of the symbolism and show its relationship to the purposes for which it is used, the context of use, and the problems generated by its use. This is often very hard to do, since it is hard to become conscious of the ways we are using symbols. I shall try, in the rest of this section, to give a brief indication of the sort of analysis that is required.
A method of representation may possess problem-solving power, relative to a domain, for a number of different reasons.
These form just a subset of the problems about adequacy of representations which have had to be faced by people working in artificial intelligence. (See Hayes, 1974, Bobrow, 1975, Minsky, 1975.) The subject is still in its infancy, and criteria for adequacy of representations are only beginning to be formulated. The sorts of issues which arise can be illustrated by the following list of properties of analogical representations which often make them useful:
In analogical systems it seems that a smaller proportion of well-formed representations can be uninterpretable (inconsistent). This is because the structure of the medium, or the symbolism used, permits only a limited range of configurations. Pictures of impossible objects are harder to come by than Fregean descriptions of impossible objects. This means that searches are less likely to waste time exploring blind alleys.
(This is not as simple a process as it sounds.) By contrast, the differences in the forms of words describing objects which differ in shape or size may not be related in magnitude to the differences in the objects. The difference between the words 'two' and 'ten', for example, is in no sense greater than the difference between 'two' and 'three', or 'nineteen' and 'twenty'. 'Circle' and 'square' are not more different in their form than 'rectangle' and 'square'. So substitution of one word for another in a description need not make a symbolic change which is usefully related to the change in meaning. In particular, this means that the notation does not provide an aid to ordering sets of possibilities so that they can be explored systematically.
Sometimes there are devices for abbreviating sentences repeating a single word, by using 'and' to conjoin phrases, for example, but one could not get rid of all repetitions of place names like this. If the sentences are stored in a list of assertions, then in order to find all the facts concerning any one place it is necessary to search for all the sentences naming it. For some places it is possible to collect together all the sentences concerning them, but since such sentences will generally mention lots of other places too, we cannot collect all the facts about a place under one heading, simultaneously for all places, without an enormous amount of repetition. This problem is avoided in a map.
The same effect as a map can be achieved in a computer data-structure by associating with all objects a set of 'pointers' to all the stored assertions about them, that is, a list of addresses at which assertions are stored in the machine. The facts do not then need to be repeated for all the objects they mention. This sort of technique can lead to the use of structures, within the computer, which include relationships representing relationships in the world. Programmers often make their programs use analogical representations because of the efficiency achieved thereby.
From the new configuration the new relationships between objects (which ones are near to which others, which are north of others, etc.) are as easily 'read off as before the alteration. By contrast, if instead of representing all the initial representations by location on a map, we make a lot of assertions about their relationships, then for each change of position a large number of changes will have to be made in the stored assertions. Of course, this problem can be minimised if we have some way of recording position without doing it in terms of relations to all the other objects, for instance by storing a pair of co-ordinates (latitude and longitude). This also requires good methods for inferring relationships from such stored positional information. Notice incidentally that the use of Cartesian co-ordinates to represent position, and more generally the use of algebraic methods in geometry, involves using sets of numbers as an analogical representation for sets of locations on a line that is, order relations and size relations between numbers represent order relations and distance relations.
This chapter does not repeat the section near the end of the 1971 paper criticising claims made by McCarthy and Hayes about the sufficiency of logic+fluents for future intelligent machines. See section 6 of that paper, headed "6. Summary of disagreements with McCarthy and Hayes". By 1978 it wasn't clear that they still held the extreme views expressed in their 1969 paper.
I have been trying to show that questions about which form of representation should be used can be discussed rationally in the light of the purposes for which they are to be used and the problems and advantages of using them. In some circumstances, analogical representations have advantages.
The problem of deciding on the relative merits of different ways of representing the same information plays a role in the development of science, even if scientists are not consciously thinking about these issues. Similarly a child must be acquiring not only new facts and skills but new ways of representing and organising its knowledge. Very little is currently known about such processes, but the attempt to design machines which learn the sorts of things which people can learn is helping to highlight some of the problems.
The issues are complicated by the fact that one type of representation can provide a medium within which to embed or 'implement' another (see Hayes, 1974). For instance, by using a suitable method of indexing statements in a Fregean language we can get the effect of an analogical representation, as I have already indicated in discussing maps. Another example is the use of two-dimensional arrays to represent two-dimensional images in a computer. There is not really any two-dimensional object accessed by the program, rather a linear chunk of the computer's memory is organised in such a way that with the aid of suitable programs the user can treat it as if it were a two dimensional configuration addressable by a pair of co-ordinates. (Actually the physical memory of the computer is not really linear but it is interpreted as a linear sequence of locations by mechanisms in the computer.)
In Chapter 8 on learning about numbers, I give examples of the use of lots of linked pairs of addresses to build up data-structures which in part function as analogical representations, insofar as the order of numbers is represented by the order of symbols representing them. This is another example of one sort of representation being embedded in another.
Computer programs can be given the ability to record and
analyse some of their own actions. There will generally be a
limit to what a program knows about how it works, however.
For instance, programs cannot normally find out whether
they are running on a computer made of transistors or some
(Note added 8 Feb 2016: that could be false if the machine has an instruction to report such information if interrogated by a program. Even if that is not the case, the growth of the internet and the availability of "search engines" could allow a resourceful program to gain information about its host computer via the internet.)
Similarly, a program may be able to record, and discuss the fact that it is accessing and modifying a two-dimensional array, or moving along a linear list of some kind, without being able to tell how the array or list is actually represented in the computer. So a program could be under the illusion that it is building and manipulating things which are very like two-dimensional pictures on paper, or very like physical rows of objects, not knowing that really it is using scattered fragments of an abstract address space managed by complex storage allocation routines and accessed by procedures designed to hide the implementation details.
When such a system is asked about its own mental processes, it could well give very misleading accounts of how they work. Phenomenologically, of course, it could not but be accurate. But it would not give accurate explanations of its abilities, only descriptions of what it does. No doubt people are in a similar position when they try to reflect on their own thinking and reasoning processes. In particular, we see that very little explanatory power can be attached to what people say about how they solve tasks set for them by experimental psychologists interested in imagery.
One moral of all this is that often a discussion of the relative merits of two kinds of representation needs to take account of how the representations are actually constructed and what sorts of procedures for using them are tacitly assumed to be available. (For further discussion, see Hayes, 1974, Sloman, 1975.)
Very many problems have been left unsolved by this discussion. In particular, it is proving quite hard to give computers the ability to perceive and to manipulate pictures and diagrams to the extent that people do. This is an indication of how little we currently understand about how we do this.
See these two books, for example, (both of which contain papers that are sequels to this chapter):
M. Anderson, B. Meyer P. Olivier (eds), Diagrammatic Representation and Reasoning, Springer-Verlag, 2001.
My own papers in those books are also available online
I believe that we cannot hope to understand these issues independently of understanding how human vision works. Likewise, any satisfactory model of human visual capabilities must include the basis for an explanation of how visual reasoning works. Chapter 9 of this book presents some ideas but is still a long way from an adequate theory.
Also relevant are Talks 7, 8 and 111 here:
http://www.cs.bham.ac.uk/research/projects/cogaff/talks/ on visual reasoning and on architectural requirements for biological visual systems, as well as more recent talks in the same directory.
Chap 8 Note 5 On biology and mathematics, added 1 Jul 2015.
Original pages 177-216
This chapter is a modified version of a paper with the same title presented to the AISB Summer conference, in July 1974, at The University of Sussex.
Most of the content was inspired by my interactions with Benjamin Sloman while he was learning to think about numbers, aged about 5 years, during a year (1972-3) when I was visiting the University of Edinburgh, aged about 36. I was learning to think about information structures, programs and architectures while he was learning to think about numbers (and many other things.)
We both learnt an enormous amount that year. Trying to understand
his development, and ways in which it could be influenced (programmed?)
helped to convince me that AI was at least beginning to produce theories
of the right general sort, though still lacking in detail and
(Ben was born in November 22nd 1967 and died 2nd February 2002)
Here is a typical conversation with a child aged between three and a half and five years.
Adult: Can you count up to twenty?Does this child grasp number concepts? Perhaps there is something wrong with the question, because number concepts are not simple things which you have either grasped or not grasped?
Child: One two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen seventeen eighteen twenty.
A: What comes after three?
C: One two three four --- four.
A: What comes after eight?
A: What comes after six?
C: Don't know
A: What comes before two?
A: What comes before four?
A: How many fingers on my hand?
C (counting fingers): One two three four five
A: What's two and three?
C (counting fingers): One two three four five. Five.
What are number concepts? How is it possible for them to be learnt? How is it possible for them to be used? How is it possible to discover non-empirical facts about them? I believe we are not yet able to formulate adequate answers to these questions. What follows is offered as a preliminary exploration of some of the issues.
The method illustrated below is important. Previously (in Chapter 2), I argued that a major aim of science is to find out what is possible and explain how it is possible. We all know a great deal about what it is possible for adults and children to do with numbers. So, instead of collecting facts by doing experiments on children, we can generate requirements for explanatory theories by reflecting on the fine-structure of familiar human abilities. In other words, methods of conceptual analysis, typically practised by philosophers and linguists, can be an important source of data for psychology. (Compare Chapter 4.)
I am not suggesting that conceptual analysis suffices to reveal everything we would like to know about, for example, ordinary counting abilities. The claim is only that it is foolish to embark on expensive empirical investigations before making a serious and systematic effort to articulate what you already know about the subject matter.
Here are some of the questions for which answers are lacking:
I shall try to show how thinking about such apparently psychological questions can lead towards new answers to old philosophical problems about the nature of numbers, thereby providing further support for the claim that academic barriers between philosophy and science are artificial, (Some implications regarding information processing architectures for intelligent systems will emerge as a side-effect.)
(Aristotle?)* * *and$ $ $
All the views listed above combine elements of truth with distortions and oversimplifications. I think that Wittgenstein's answer comes closest to encompassing the truth. In his writings he formulates many problems about mathematics, which are not answered by other theories, but his own solutions seem to me to be too shallow.
In particular, the anti-mentalism, or anti-psychologism, which pervades much of his writing prevents him from discussing mental processes in any depth. So he writes as if thinking about numbers were an essentially social process, consistently with his conclusion in Philosophical Investigations that all rule-following is an essentially social process, dependent on the existence of a public language.
This conflicts with a computational analysis of mental processes, according to which it is perfectly possible for a non-social mechanism to contain within itself rules which it can obey, for instance, programs transmitted genetically.
Wittgenstein's position also conflicts with any sensible account of the biological evolution of mental processes in precursors of homo sapiens.
I am not going to try to solve all the philosophical and psychological problems about numbers in one chapter. I shall merely try to show how we can get important new insights into the problems, and perhaps take some small steps towards formulating possible answers, if we think about the mental processes and mechanisms as if they were analogous to the processes and mechanisms involved in so-called 'list-processing' computer programs. Adequate exploration of these issues has been hampered by the current separation of philosophy and psychology, and the ignorance among most philosophers and psychologists of computing ideas.
I shall not be talking about events or processes or mechanisms in the human brain. Exactly how the brain works is as irrelevant to our problems as the detailed workings of a computer are to an explanation of a computer program written in a high-level programming language. There may be creatures on other planets, or robots, whose brains are totally unlike ours in their physiological details, yet such beings could well learn about numbers, and learn the same concepts as we do, just as two computers with quite different physical components can execute the same 'high-level' programs. ( Incidentally, this undermines philosophical theories which claim that mental processes are identical with brain processes. This is as inaccurate as the claim that computational processes in a computer are identical with physical processes.)
When I talk about mechanisms involved in using numbers, I am not talking about physiological mechanisms. I am talking about aspects of the way information is organised and represented, and about the kinds of symbol-manipulating processes which may be necessary for accessing and using various sorts of representations. In particular, such processes involve the following of rules, instructions, or plans, whether consciously or unconsciously.
The main assumption is that we can speak of the human mind as storing information in a vast collection of locations'. They need not be spatial locations, like shelves in a library. Positions in any kind of symbolic space with appropriate mechanisms for storing and retrieving information will do. So the word location' is being used as a technical term. For instance, radio waves are often used to transmit information, different information being transmitted at different frequencies. So information could be stored in a collection of continually reverberating radio waves, with different symbols stored at different frequencies. Each possible frequency would then be a location in the sense required here.
Similarly, possible structures of a certain class of molecules could define 'addresses' in a space. Storing information at a certain address would mean attaching that information to molecules with the structure represented by the address. This could be done more or less simultaneously in many different physical places. But the information would still be stored in one symbolic place, just as a name occurs at only one symbolic location in a telephone directory, even though there may be millions of physically distinct copies of the directory containing the name. So from now on, when I talk about locations, this is neutral as to what sorts of locations they are.
I shall assume then that a mechanism is available which can store symbols in some 'space' of locations. Further, I assume that it is possible for some of the symbols to represent locations in this space. (For instance, a directory, or catalogue, can contain entries which refer to the location' of other entries, by page or section number.) Thus the space can contain information about itself.
A symbol representing a location can be called a 'pointer', or an 'address'. So the storage mechanism can be given an address and asked to produce the symbol located there. In other words, when given a pointer, it can determine what symbol is pointed at. What is pointed at may be a complex structure containing a symbol which is itself an address of some other location, that is a pointer to another symbol. (See Figure 8.1.) So the space may contain chains of pointers. (In more elaborate systems, the addressing may be relative to a context or mode of operation. That is, which location is represented by a given symbol may depend on the current state of the accessing sub-mechanism. Some of the flexibility of behaviour of the system may depend on such systematic changes in the 'meaning' of symbols.)
The concept of a symbolic structure containing pointers into itself, and the investigation of processes in which such things are manipulated and used for solving problems, are among the important contributions of computing science. I shall try to show how these ideas help us to think about a child's ability to count, an ability which provides the substratum for a grasp of number concepts.
The first task is to make explicit some of our commonsense knowledge about the sorts of things we can do with number words and number concepts. Note the 'can': it is possibilities we most need to explain, not laws, that is not regularities or correlations. We know relatively few non-trivial laws of human behaviour. But we know of very many human possibilities, namely, many things at least some people can do. By thinking about possible mechanisms underlying fairly common abilities we can reveal the poverty of most philosophical and psychological theories about the nature of mathematical concepts and knowledge. These theories do not account for the fine-structure of what we all know. All this illustrates methodological points made in Chapter 2 and Chapter 3.
o The entities thus related need not be of the same kind: e.g. a 1-1 correspondence can be set up between a collection of physical objects and an abstract collection of names, or between a collection of people and a set of objects they need, e.g. people and chairs, people and places at a dinner table, or between a spatial configuration (steps on a staircase) and a temporal sequence (e.g. a sequential process of putting a foot on a step), or between botanical species and musical compositions, etc. etc.
o The entities involved in such a correspondence can include arbitrary ordered items, such as learnt collections of spoken names (e.g. the sounds "one", "five", "thirteen"), or learnt collections of spatial symbols (e.g. the printed symbols "one", "1", "99", "a", "B", etc.) or temporally or spatially ordered collections, e.g. a remembered row of statues, or a remembered sequence of events. (See https://en.wikipedia.org/wiki/Mnemonic)
o A fixed list of memorised spoken or written names could be used for counting small sets, but in order to understand number concepts fully one must realise that any fixed finite list would not be adequate for counting sets of objects with more components than there are learnt number names.
o The need for an unbounded set of number names can be met by using systematically generated numerical labels, with a principle of generation that is inherently unbounded. There are many ways of making a collection of names indefinitely extendable, including the use of repeated signs (X, XX, XXX, XXXX, XXXXX,... etc) use of familiar decimal and binary number notations or Roman numerals, and use of verbal rules for generating new names, which differ between human languages. (https://en.wikipedia.org/wiki/Roman_numerals#Large_numbers)
o The system used may or may not cope with empty collections. We now take it for granted that an item before "1" occurs in the natural number series, namely "0", though it is not normally used for counting, except in connection with locations in a computer memory. (https://en.wikipedia.org/wiki/0_%28number%29)
o The elements of a set of items in a 1-1 correspondence with another set do not need to be in close proximity, and need not exist at the same time: e.g. they could be towns on different continents, events in different centuries, etc. Thus it makes sense to ask whether Julius Caesar ate more meals than there are cities in California at present. (I do not know whether any psychologist has investigated the development of the ability to understand this feature of natural numbers - Piaget seems most likely. Do any mathematics teachers teach or test for this?)
1-1 correspondences can be used for a wide variety of different purposes,
- providing factual information, e.g. answering questions (how many dishes are there?),
- achieving practical goals (e.g. fetching enough dishes for the people in the room),
o It is possible to notice certain recurring patterns in the uses of such symbols for which exceptions are impossible, e.g. that if two non-overlapping unchanging collections of three items and five items are treated as one larger collection, then the process of counting elements of the larger collection in standard ways will always give the same result: eight items.
o Being in a 1-1 correspondence is a transitive relation between collections. E.g. if elements of collection A are in one-to-one correspondence with elements of collection B, and elements of collection B are in one-to-one correspondence with elements of C, then there is a one-to-one correspondence between elements of C and other entities.
o The relationship of being in a 1-1 correspondence is not affected by permutations of the members of either set. E.g. if [P Q R] an [cat mouse dog] are in 1-1 correspondence then so are [P Q R] and [dog mouse cat]: order of elements of the collection does not affect existence of a 1-1 correspondence even though every such correspondence makes use of an ordering of each set.
o The existence of a 1-1 correspondence beteen two collections is not affected by overlaps between the two collections or differences of ordering in the two collections. For example, the existence of a 1-1 correspondence between [pig dog cat plum rock] and [ball rock tree pig cloud] is not affected by the fact that there are common elements in a different order in the two collections.
o Many relationships can be discovered between two (or more) collections. For example if collections A and B are not in a one to one correspondence then either A corresponds to a subset of B or B to a subset of A, depending which collection has left-over items when the elements are paired. Changing the pairing of elements in A and B cannot affect which one has items left over. That's because combining the mapping in which B has items left over with the mapping in which A has items left over can produce a mapping of A to itself in which all elements of A map 1-1 onto a subset of elements of A. (Creating a diagram should make this clear.)
o There cannot be a largest collection of objects, because whatever collection is considered an additional item, real or imagined, can always be added to produce a new larger collection.
o There are operations by which two collections of unique objects can be used to form a new collection. For example, combining two collections and making sure that any item that occurs in both is represented by two distinct items in the combined collection can be used to define the sum of two numbers, i.e. addition. Defining the difference between two numbers, i.e. subtraction, is a little more complicated. Defining multiplication (the product) of two numbers can start with two collections and A and B use them to build a third collection C in which each element of A is represented multiple times, by pairing it with each element of C. E.g starting with two collections:
[tom mary] [cat dog pig]we can create the new collection:
[ [tom cat] [tom dog] [tom pig] [mary cat] [mary dog] [mary big] ]Learning about such operations can provide a basic understanding of multiplication (which does not eliminate the need to memorise multiplication tables once the principles are understood).
o A great deal of primary school mathematical education involves coming to understand a collection of mathematical theories, not all of which are made explicit by the teachers, sometimes because their own understanding is incomplete. This is particularly likely to be true of their understanding of the natural numbers.
o ... to be continued
Reflecting on even the simplest things we know children can learn (although not all children learn all of them) shows that children can somehow cope with quite complex problems of storing, using and manipulating symbols, that is, computational problems. Some of these problems are common to many forms of learning, others peculiar to counting.
I shall start with problems involved in learning number words. These problems are common to all words. Next, there are problems concerned with the fact that number words form a sequence to be memorised. Some of the problems are common to many other sequences, for instance letters of the alphabet, digits in telephone numbers, the letters used to spell a word, sequences of sub-actions making up a learnt action (a dance-routine, or a method for testing faulty engines). Finally, I shall mention some problems peculiar to numbers, without offering more than tiny steps towards solutions.
There are many facts about number concepts and the ways in which they are used that I shall not attempt to analyse or explain. For instance, I shall say nothing about our ability to learn to generate an indefinitely extendable set of number names in a systematic fashion, or our ability to learn to think algebraically about numbers, for example in proving general truths about adding, subtracting, multiplying, etc., without mention of particular numbers.
In unravelling some of the hidden complexities in even the simplest abilities, I hope to give a feeling for the even greater complexities still to be explored. The intellectual tasks accomplished by ordinary children in apparently simple activities are comparable in complexity to some of the mental processes of adult scientists, engineers and artists. The children merely have less knowledge to build on.
If children have these impressive powers, why don't they use them to learn about arithmetic, reading, music, painting, and so on, without formal schooling, just as they learn to walk, talk and manipulate objects without formal schooling?
Perhaps the answer is that despite all the variations in parental behaviour and home environment, nearly all children are placed in situations where learning to talk, walk, etc. are essential for them to achieve things they are highly motivated to do (like eating and interacting with other people), and moreover there are well-structured opportunities for them to learn, even though they learn things at different rates and in different orders.
By contrast, very few parents and teachers are able to provide similarly highly structured and highly motivating situations to generate learning about reading, writing, mathematics, science, music, history, etc. One of the difficulties of investigating such issues without good theories of learning is that there are so many different factors which can make a difference in subtle ways. (Selfe 1977 presents relevant evidence in the drawings of an autistic child.)
Does the stored description cope with all variations by making use of relatively abstract specifications (whatever that means)? Or does the child store different descriptions corresponding to different ways the sound may be uttered? In the former case, how does the child learn to use descriptions with sufficient generality, and in the latter case how does she represent the fact that the different descriptions are of the same word?
Or is some other method used to cope with variations, such as storing a specific description (a description of a 'prototype' or 'template'), and using a flexible matching procedure so that things not quite like it will match anyway? (This kind of 'sloppy matching' is often useful in computer programs.) Or, as Kant suggested in his discussion of schemata, do we cope with variations by using rules or procedures for synthesis and analysis rather than stored templates or descriptions? For instance, a rule which says 'count the number of consecutive occurrences of "ho" in an utterance and if the result is above two then call it a laugh' can enable one to recognize laughs' of very varied lengths. (I am not suggesting, and neither was Kant, that these matching and testing processes are conscious. In any case, we know so little about the difference between what we are and are not conscious of, that we cannot draw any useful conclusions from the fact that they are mostly unconscious processes.)
For a brief introduction to further complexities of recognition, see Chapter 9. The artificial intelligence literature takes the topic much further.
These problems can be dealt with if the child not only stores items, but also builds appropriate indexes to what it knows. For instance we use alphabetically ordered indexes to help us search books, libraries, department-stores, etc. (How? Think about how you might teach a child or a computer to use an alphabetic ordering to avoid a complete linear search.) What sorts of indexing techniques do children use, and how are they able to use them? Are we born with some sophisticated indexing strategies? Is it possible that children unconsciously use some kind of ordered set of symbols, like an alphabet, and build 'alphabetically' ordered or tree-structured catalogues of what they know, to minimise searches?
Librarians and computer scientists do not find it easy to design good methods of cataloguing and indexing. Children must be much more sophisticated, although unconsciously.
Why don't we (and children) learn things permanently as soon as we hear them? Why is repeated hearing sometimes needed for learning? One popular answer is that memory uses probabilistic mechanisms, and that repeated exposure to an item increases the probability of its being retrieved later. How this happens is rarely explained. In any case, it does not seem to be consistent with the fact that not all learning requires repetition. If someone tells you that he plans to leave you a fortune, you will probably remember it for a long time without his having to repeat it. And faces seen once for a short time are often recognized long after, even if nothing else is recalled about the context in which they were first seen, though not all shapes are so easily remembered. So we do have some abilities to store things quickly and permanently: why are they not applied to everything we experience?
Here is a sketch of a non-probabilistic explanation of the need for repetition in some cases: the child needs to experiment with different ways of analysing, describing, and indexing new experiences. For example, it may be necessary to experiment with different ways of describing the sounds of words, so as to develop a good way to cope with variations in the sound of a word. It may even be necessary to experiment with ways of analysing a total experience before a particular sound pattern can be noticed as a significant substructure in any experience. Many adults have already developed good ways of indexing information about likely disasters and benefactions, so that they can store important items and access them later without repetition.
A closely related problem is worth mentioning. At any moment a child's experience is rich and complex. How are some features selected to be stored? How does the child decide what is worth learning? More fundamentally, how are some aspects of the current experience selected as candidates for things to be recognised if possible? How is a chunk of sound selected from the whole stream of sounds for an attempt to find a match among known items? To say that the child selects what 'interests' her is no explanation, since she can only decide that something is interesting after it has already been recognized. (Or at least some parts or aspects of it have been recognized.) These questions are taken up again in the chapter on visual perception.
Clearly we need some general 'interpretative' procedure in order to be able to repeat a sequence of sounds which we do not recognise, for instance when imitating someone talking a foreign language, where there is no question of simply repeating something learnt previously.
Perhaps there are good reasons, if there is no shortage of space, for storing both explicit instructions for producing the sound and the specification which allows the sound to be recognised. But this raises new problems. If the knowledge of how to say the word is represented differently from knowledge of what it sounds like, how are the two items related? In particular, how is the appropriate knowledge found when needed?
This is just a special case of a much more general problem about how one piece of information can have other kinds of information linked to it, or associated with it. Suppose you have managed to find in your memory something matching a .word you have just heard. How does that help you access your knowledge of how to reproduce the word yourself?
For example, a child has to associate the sound of a number word not only with how to say it, but also with a method of writing it down, and a method of reading written versions of it. She may even learn to say or write it in several different languages, or several different notations for numerals. Moreover, as she learns more and more about numbers, she will have to associate lots more information with each number name, including: the fact that it is a number name, that it is a word associated with certain games (e.g. chanting things in sequence), the fact that its successor is so and so, that its predecessor is so and so, the fact that it is or it is not a prime number, the fact that it is odd, or even, its 'multiplication table', its 'addition table', and so on.
Of course, not only number names generate this problem. Many known items each have to be associated with a large and growing collection of other items. For instance, in your mind your home town will be linked to very many facts which you know about the town, such as its name, its location, its direction and distance (roughly) from major towns, its population, many of its geographical details, and so on.
So we have some new problems. First, if you cannot tell when you first learn a word, say 'three', how many further items of information are to be associated with it as a result of further learning, you cannot tell how much space to reserve in the neighbourhood of the location at which a description of the sound is stored. If too little is reserved, you'll find a limit to what you can learn about the number. But people do not seem to have such limits. (For instance, think of all the things associated with the word 'word' in your mind, i.e. all the words you know.) Is there an upper limit? Some people can learn several languages!
Of course, the need for expansion could be dealt with by moving the whole collection of linked items to a larger unoccupied space if the initially reserved space overflows. Are we to assume that children have the ability to manage storage allocation like this? For instance, do they have ways of telling which of the 'free' locations have a large enough collection of free neighbouring locations? Large enough for how many additional items? Is it possible that extra chunks of space are allocated in minimal units, as in some computer storage-allocation procedures? Perhaps people solve the problem by using an abstract symbolic space of locations, like the space of decimal numbers: this has the advantage that new neighbours can always be generated for any given location. But this merely shifts the problem, for we now have to explain how information is stored about which symbols occupy which locations!
Philosophers love to analyse the concept of rationality, and to discuss rational ways of doing things. But I have yet to hear them discuss what it means to have a 'rational' way of organising and using a massive store of knowledge, subject to the constraint that in real life decisions often need to be taken fairly quickly. Attempting to design a working system forces one to address such issues.
Obviously this search has to be controlled by the request or question. For instance, in this case, the hearer has to find something associated not only with 'three' but also with 'write down'. How is this done? There are many techniques for this sort of thing which have been explored by computer programmers, and some of them are quite sophisticated. Do children have the ability to perform the elaborate operations used by such programmers, or do they have special techniques not yet discovered by programmers?
The problem is compounded by the fact that having learnt about some structure, we can then learn about a larger whole containing it as a part. You probably recognise not only the individual words, but also the whole phrase here: 'three blind mice'. Which method is used for obeying an instruction like
'count to three'?Has the whole instruction been memorised and stored (e.g. because it is encountered frequently), or is there a process by which something associated with one of the words (e.g. 'three') is found because it is also associated with one of the others, or does something much more elaborate than retrieving a stored specification go on?
Is it possible that analysis of the instruction is somehow used to generate an action-specification? Quite likely we can do both, namely analyse the instruction using general principles and recognise it as a familiar whole. So how do we, and children, decide (unconsciously) which to do?
An explanatory theory, which purports to answer the questions raised here, must specify some kind of mechanism which is not merely able to hold learnt information in an efficiently accessible form, but is also capable of explaining how complex information structures are built up, how they are modified or replaced (e.g. when mistakes are discovered), and how they are used. I do not believe that educational psychologists have even the foggiest notion of what such a mechanism might be like, or what its limitations are, or what sorts of teaching strategies might interfere with its operation or facilitate learning. Some gifted teachers may have an intuitive grasp of some aspects of the mechanism, but they probably cannot articulate their implicit theories.
Computer scientists dealing with problems of managing complex collections of information in a flexible way seem to have unwittingly invented possible explanations some of which I sketch briefly below.
If we can find good theories, we may be able to do something about the large numbers of children who, for one reason or another, fail to learn so many things which might be useful or enriching to know. I believe that all normal children have the potential to learn a great deal of mathematics and other technical subjects painlessly, if only we knew how to prevent our teaching methods and attitudes to children (at home and in schools) from interfering with the learning process.
The problem cannot be solved by simply storing some such symbol as 'four is after three', that is, a representation of the required fact, since that would not always be the appropriate answer. For instance the question might be In the song Ten green bottles what comes after three?' And if the context is unambiguous it is not even necessary to mention the song explicitly in the question.
So finding the required item of information may involve analysing the question in such a way as to control the search for relevant links in memory.
For example, it may be that each item which is associated with several others somehow has links to those others which are labelled as represented in Figure 8.1. It is very easy to draw diagrams like this, but not so easy to describe mechanisms which can build and use such structures. A common method used by programmers is that shown in Figure 8.2. A 'property-list' or 'association-list' is made up of a chain of links where each link contains two storage cells treated as an association by the memory mechanism, for example because they are adjacent in the memory space.
At least if the items associated with 'three' are all accessible through a linear list, then fairly obvious search procedures will enable the wanted item to be found, provided the location of the initial link of the list can be found easily.
A chain of links may be attached to some item, for example the concept numbers, or the concept three with related items 'hung' from the chain by means of pointers giving their addresses. As Figure 8.3 shows, the items hung from the chain may themselves be associations, corresponding to the labelled links of Figure 8.1. Thus in the context of the chain attached to 'three', there is an association between 'predecessor' and 'two', whereas in a chain attached to 'four' (not shown) there would be an association between 'predecessor' and 'three'. Associations are relative to context.
Stored structures are not enough. Procedures are required for creating and finding associations in them. Such procedures are easily defined using modern programming languages. Suppose you want to search down a chain, starting from a specified link, looking for an association with a specified label (e.g. 'successor', or 'type'), because you want to find the item associated with that label in the chain. The obvious way is to see if the association pair pointed to by the given link starts with the required label, and if so to treat the second element of the pair as the desired result. Otherwise start again with the next link, whose address is in the BACK of the given link.
In a suitable programming language one could express this as Procedure-8.1, with the name ASSOC. (I am assuming that the subroutines FRONT and BACK when applied to a given pair produce the first thing and the second thing in the pair respectively. See Burstall et al. Programming in POP2 for more details.)
Procedure ASSOC:Given: initial LINK of chain, and target LABEL
Is FRONT of FRONT of LINK equal to LABEL?
If so, result is BACK of FRONT of LINK. STOP.
Otherwise, assign BACK of LINK to LINK, and restart, with LABEL as target.Procedure-8.1
So ASSOC('THREE', 'TYPE') could represent the application of this procedure to a memory structure like Figure 8.3, with LINK starting as the first link in the chain called THREE', and LABEL having TYPE' as its value. The procedure would find a pointer to NUMBER as its result.
Similarly ASSOC('THREE', 'SUCCESSOR') would find the successor of 'three', namely 'four'. The same thing could then be used to find the successor of 'four', if that had been stored appropriately. By interleaving such searches with actions of saying what has been found, the child would have a procedure for counting, that is for reciting the numbers in their appropriate order. (More on the problems of interleaving later.)
Another way of thinking about this, is to say that information stored in a collection of structures like Figure 8.3, one for each known numeral, can be thought of as a sort of program for doing various things. The structure shown in Figure 8.2 is a much simpler program, and there is less that can be done with it. However using it as a program for counting is a simpler matter than using a collection of structures like Figure 8.3, since in Figure 8.2 all you need do in order to decide what to say next is find the link pointed to by the BACK of the current link in the chain (whose FRONT is a number or numeral), whereas in Figure 8.3 you first have to search for the 'successor' label, and then take the link it points to, and then start again from that link. We shall see later that different sorts of chains can coexist and be used for different purposes. (Figure 8.6)
Of course, there are many more structures and procedures that might be used for storing information about linear sequences in a computer, or in a mind. Different methods have their own advantages and disadvantages. For instance, the method of Figure 8.2, though simple and quick to use, has the disadvantage that when you get to the link involving 'three', there is no information stored there about items coming earlier in the chain. So using that structure makes it harder to answer questions like 'what comes before three?', though easier to answer questions like 'what comes after three?'
This is a space-time 'trade-off. Other trade-offs involved in selecting representations include efficiency vs flexibility, simplicity of structures vs simplicity of procedures, and so on. Chapter 7 discussed trade-offs between Fregean and analogical representations. Investigations of such trade-offs between different representations is central to artificial intelligence but has hitherto been absent from philosophical discussions of rationality and most psychological theorising about cognitive processes.
Proposed explanations of a child's counting abilities must do much more than explain how the child manages to recite known numbers, or how the child answers simple questions. For example, it is necessary to explain also how the representation gets built up in the first place, how new items are added, and how mistakes are corrected. One may miss out an element of the sequence, or store some elements in the wrong order. So procedures are required for inserting new links, for deleting old ones, and perhaps for changing the order of existing links, when mistakes are discovered.
A more complex procedure is required for adding a new association: it will have to get a free link (how?) and insert it at a suitable place in the chain, with its FRONT pointing to the new association and its BACK pointing to the next link in the chain, if any.
If children do anything like this to store and use associations, then how do they build up such chains, and how do they come to know the procedures for finding required associations? Perhaps the ability to learn and use chains of associations, employing procedures something like ASSOC, is inborn? Clearly not all procedures for getting at stored information are innate. For instance, children have to learn how to count backwards or answer 'What's before "four"?' even though they may already know the order of the numbers. The same applies to other sequences children learn. (More about such tasks later.)
For instance, people can execute unrelated actions in parallel, like walking and talking. Moreover, they apparently do not require their procedures to have built-in tests to ensure that conditions for their operation continue to be satisfied. Nor do they require explicit instructions about what to do otherwise, like instructions in a computer program for dealing with the end of a list. All sorts of unpredictable things can halt a human action at any stage (like learning one's house is on fire) and a decision about what to do can be taken when the interruption occurs, even if no explicit provision for such a possibility is built into the plan or procedure being executed.
These points suggest that models of human competence will have to use mechanisms similar to operating systems for multi-programmed computers. For instance, an operating system can run a program, then interrupt it when some event occurs although the program itself makes no provision for interruption. Similarly, if something goes wrong with the running of the program, like an attempt to go beyond the end of a list, the program breaks down, but the operating system or interpreter running the program can decide what to do, (for example, send a message to the programmer), so that there is not a total breakdown. Of course the operating system is just another program.
So the point is simply that to make the program metaphor fit human abilities we must allow not merely that one program can use another as a 'subroutine' but that some programs can execute others and control their execution, in a parallel rather than a hierarchic fashion. (For more on this, see chapters 6, 9 and 10.)
Procedure COEXECUTE:Given: step-by-step procedures P1 and P2,
Execute a step of P1.
Execute a step of P2.
Has a stopping condition been reached?
If not, restart COEXECUTE (P1, P2).Procedure-8.2
Unfortunately, this is not an acceptable model in view of the familiar fact that children (and adults doing things in parallel) sometimes get out of phase when counting and ( sometimes) stop and correct themselves. This suggests that keeping the two sequences in phase is done by a third process something like an operating system which starts the processes at specified speeds, but monitors their performance and modifies the speeds if necessary, interrupting and perhaps restarting if the sequences get out of phase. All this would be impossible with the procedure COEXECUTE. It is as if we could write programs something like the procedure RUNINSTEP.
Procedure RUNINSTEP:Given: procedures P1 and P2,
DO (a) to (d) in parallel:
(a) repeatedly do P1
(b) repeatedly do P2
(c) observe whether (a) and (b) are getting out of step and, if they are, slow one down or speed up the other.
(d) if (a) and (b) are right out of step re-start P1 and P2Procedure-8.3
The computational facilities required for this kind of thing are much more sophisticated than in COEXECUTE and are not provided in familiar programming languages.
[[Note added January 2002 The ability to monitor and modify two concurrently executed processes requires an information processing architecture which is not supported by the virtual machines defined by most programming languages, though it is a feature of operating systems. This is one of the sorts of tasks that might be required in a "meta-management" system, described in the presentations here: http://www.cs.bham.ac.uk/research/projects/cogaff/talks/ ]]
Notice also that there is a complex perceptual task involved in deciding whether two processes are getting out of step, and children sometimes find this difficult. Not only children: try counting rotations of a wheel with no clear markings on it, while it is turning quite rapidly!
Further, the child has to be able to apply different stopping conditions for this complex parallel process, depending on what the task is. So it should be possible for yet another process to run the procedure RUNINSTEP, watching out for appropriate stopping conditions. Alternatively, the procedure could be re-defined so as to have an additional 'given', namely a stopping condition, and an extra sub-process, (e), watching out for it. For instance, when the question is 'How many buttons are there?' use 'No more buttons' as main stopping condition, whereas in response to a request 'Give me five buttons', use 'Number five reached' as main stopping condition.
I say main stopping condition, because other conditions may force a halt, such as getting out of phase or running out of numbers or (in the second case) running out of buttons.
How do children learn to apply the same process with different stopping conditions for different purposes? How is the intended stopping condition plugged into the process? Notice that the perceptual tasks are further complicated by the need to detect different sorts of conditions, for example, completion of the task, getting out of phase, running out of things to count, mistakes like counting the same thing twice, or leaving something out, and so on.
Some of this would be easy for a programmer using a high-level language in which a procedure (to test for the stopping condition) can be given as input to another procedure but do children have such facilities, or do they use mechanisms more like the parallel processes with interrupt facilities described here?
I believe we know very little about how children achieve these extraordinarily complex feats. Nor do we understand what sorts of teaching strategies can help or hinder their development. My own informal observations suggest that a tremendous amount of very varied practice is required, in an environment where teachers can use a deep analysis of failures to suggest variations in games and other learning activities. This analysis can be a challenging intellectual task. How many teachers are equipped for it?
The parallel mechanisms suggested above might explain the ability to learn to watch out for new kinds of errors. For example, after learning to count stairs, where there is little chance of counting an item twice, learning to count buttons or dots requires learning to monitor for repetition and omission.
Depending on the kind(s) of programming language(s) and operating system(s) used in a child's mind, it may be easier to add a new kind of monitoring process to run in parallel with previously learnt processes than to re-organise an existing procedure so as to include new tests at appropriate places, as would be required with a conventional programming language. Probably both sorts of learning occur.
Monitoring interactions between asynchronous parallel processes may be an important source of accidental discoveries (creativity) in children and adults. For example, ongoing (unconscious) comparisons between intermediate results of two different activities may lead one to notice a relation between the two which amounts to a new technique, concept, or theory. This whole discussion is centrally relevant to the analysis of concepts like consciousness, attention, and intention. We now have a basis for a complete rejection of a major theme of Ryle's pioneering work The Concept of Mind, namely its refusal to accept multiple inner mental processes.
We also have a basis for beginning to explore personality differences and mental disorders relating to problems of organising and controlling several different processes. By trying to design systems involving multiple interacting processes we gain a deeper understanding of the problems and possibilities.
A programmer would find this trivial, but how does a child create this kind of interleaving in his mind? And why is there sometimes difficulty over keeping track of position in the second sequence '... fifty-eight, fifty-nine, . . . um . . . er, thirty, thirty-one . . . '? Clearly this is not a problem unique to children. We all have trouble at times with this sort of book-keeping. But how is it done when successful? And what kind of mechanism could be successful sometimes yet unsuccessful at others?
My guess is that human fallibility has nothing to do with differences between brains and computers as is often supposed, but is a direct consequence of the sheer complexity and flexibility of human abilities and knowledge, so that for example there are always too many plausible but false trails to follow. When computers are programmed to know so much they will be just as fallible, and they will have to improve themselves by the same painful and playful processes we use.
This requires that besides having names and sets of instructions, procedures need to be associated with specifications of what they are for, the conditions under which they work, information about likely side-effects, etc. The child must build up a catalogue of his own resources. This is already done in some A.I. programs, e.g. Sussman, 1975.
Further, the instructions need to be stored in a form which is accessible not only for execution but also for analysis and modification, like inserting new steps, deleting old ones, or perhaps modifying the order of the steps, as is done in Sussman's program. Such examination and editing cannot be done to programs as they are usually stored, after compilation.
List structures in which the order of instructions is represented by labelled links rather than implicitly by position in memory would provide a form of representation meeting some of these requirements (and are already used in some programming languages). Thus, as already remarked, Figure 8.2 can be thought of either as a structure storing information about number names (an analogical representation of their order), or else as a program for counting. The distinction between data structures and programs has to be rejected in a system which can treat program steps as objects which are related to one another and can be changed. We now explore some consequences of this using counting as an example.
A child who counts very well may be unable easily to answer 'What comes after five?'. Later, he may be able to answer that question, but fail on 'What comes before six?', 'Does eight come earlier or later than five?' and 'Is three between five and eight?'. He does not know his way about the number sequence in his head, though he knows the sequence.
Further, he may understand the questions well enough to answer when the numbers have been written down before him, or can be seen on a clock. (There are problems about how this ability to use what you see to answer such questions is learnt, but I shall not go into them.)
Later, the child may learn to answer such questions in his head, and even to count backwards quickly from any position in the sequence he has memorised. How? To say the child 'internalises' his external actions (an answer I have often been given in the past) is merely to label the problem, if all that is intended is the claim that one can learn to represent in one's mind actions previously performed externally.
Moving back and forth along a chain of stored associations is quite a different matter from moving up and down staircases or moving one's own eye or finger back and forth along a row of objects. The latter is a physical movement through space, whereas the former is movement through a set of computational states, not necessarily involving physical movement. Lack of reversibility in one case may be accounted for by physical structures, like ratchets or uni-directional motors, whereas in the other case the explanation is more likely to be lack of information. For instance in Figure 2, the link pointing to 'four' contains information about the next link, but does not contain information about the preceding link.
Learning to overcome physical impediments to reversibility need have nothing in common with learning to overcome computational problems. The child who has learned to move his eye or finger back and forth around a clock face to answer 'what comes before four' is not thereby provided with a mechanism which could somehow be used internally. At most, it provides him with a model, or analogy, which may be helpful in grasping what the task is. But how the analogy is used is totally unexplained.
A very simple procedure enables a chain like that in Figure 8.2 to be used to generate a sequence of actions, for example the procedure RECITE.
Procedure RECITE.Given. a chain starting from LINK.
Utter FRONT of LINK.
If BACK of LINK isn't empty, make it LINK and restart.
Going down the chain starting from a given link is thus easy, and a procedure to find the successor of an item would use a similar principle. But answering 'What's before item X?' is more sophisticated, since on getting to a particular location (e.g. the link whose FRONT points to X), one does not find there any information about how one got there. Somehow the last item found must be stored temporarily. One method is illustrated in the procedure PREDECESSOR, as it might be defined in a programming language.
Procedure PREDECESSOR:Given target X in chain starting at LINK:
Create temporary store TEMP, with undefined value.
Repeat the following:If FRONT of LINK = X then result is TEMP, stop.
Otherwise, assign FRONT of LINK to TEMP and BACK of LINK to LINK and restart.Procedure-8.5
How could a child learn to create a procedure like this? Does he start with something more specialised, then somehow design a general method which will work on arbitrary chains? Perhaps it has something to do with manipulating rows of objects and other sequences outside one's head, but to say this does not give an explanation, since we do not know what mechanisms enable children to cope with external sequences, and in any case, as already remarked, chains of associations have quite different properties. For a child to see the analogy would require very powerful abilities to do abstract reasoning. Maybe the child needs them anyway, in order to learn anything.
My observations suggest that the child's learning task (at least between the ages of three and four, or later) is very different from the task of designing a procedure like Procedure-8.4. This is because the child is already able to remember steps he has just executed. So if he is asked to count to 'four' and does so, and then is immediately asked what came before 'four' he can answer. He does not have to allocate special-purpose temporary storage, like the 'local variable' TEMP. His problem is to think of counting up to four as a way of answering the question 'What comes before four?'
He does not, presumably, have a representation of the fact that if he recites some sequence he can remember the final fragment immediately after stopping. Adults have learnt this and can use it to answer a question about the predecessor of a letter of the alphabet, even if they do not have the information explicitly stored. However this technique is very tedious for reciting the whole of a learnt sequence backwards, and is useless if the process is to be done quickly. (The general ability to remember what one has just done is useful for the reasons given in Chapter 6. The reasons apply to intelligent artefacts as well as to humans. This self-monitoring is not usually a built-in feature of programming systems, but there is no difficulty of principle in incorporating it.)
If a child knew only the first four numbers, then he could memorise them in both directions, building up the structure of Figure 8.4 instead of Figure 8.2. Notice that this use of two chains increases the complexity of tasks like 'Say the numbers', or 'What's after three?', since the right chain has to be found, while reducing the complexity of tasks like 'Say the numbers backwards' and 'What's before three?'. (Another example of a computational trade-off.)
However, when a longer sequence had been learnt, this method would still leave the need to search down one or other chain to find the number N in order to respond to 'What's after N?', 'What's before N?', 'Count from N', 'Count backwards from N', 'Which numbers are between N and M?', etc., for there is only one route into each chain, leading to the beginning of the chain. For instance, when one has found the link labelled 'X' in Figure 8.4, one knows how to get to the stored representation of 'three'. But it is not possible simply to start from the representation of 'three' to get to the links which point to it in the two chains. So we need to be able to associate with 'three' itself information about where it is in the sequence, what its predecessor is, what its successor is, and so on.
A step in this direction is shown in Figure 8.5, where each number name is associated with a link which contains addresses of both the predecessor and the successor, like the link marked V, associated with 'two'. The information that the predecessor is found in the FRONT and the successor found in the BACK would be implicit in procedures used for answering questions about successors and predecessors. However, if one needed to associate much more information with each item, and did not want to be committed to having the associations permanently in a particular order, then it would be necessary to label them explicitly, using structures like those in Figure 8.1 and Figure 8.3, accessed by a general procedure like ASSOC, defined previously.
To cut a long story short, the result of explicitly storing lots of discoveries about each number, might be something like the network of associations in Figure 8.6, which is highly redundant, in the sense that a lot of the information there could in principle be derived from other information in the network. On the left there are (vertical) chains of links available for use in counting rapidly forwards or backwards, analogous to the chains in Figures 8.2 and 8.4. In addition, associated with each number is a great deal of information about it in a chain of attribute/value pairs analogous to Figure 3, except that some of the values are new sub-chains. Included in the chain hanging from each number is a pointer into the 'fast-forward' chain and a pointer into the 'fastback' chain, making it possible to count quickly forwards or backwards from that number without first having to search for the number in the relevant 'fast' chain.
In the light of the previous remarks about the need to blur the distinction between information-structures and programs, we can see how a structure like that depicted in Figure 8.6 can be thought of as containing several different programs embedded within it, such as programs for counting forward from various numbers, programs for counting backwards from various numbers, programs for answering questions, and so on. The different programs share common sub-structures.
The growth of this kind of network would be an example of the second type of learning, namely extending an information store to contain explicitly what was previously implicit in it. This often involves trading space for time. That is, much redundant information is stored explicitly so that it does not have to be re-computed every time it is needed. This includes information on how to do things. It seems that a great deal of early learning about numbers has this character, as well as much of the development of skill an fluency in thinking and acting.
"Progressive" educational procedures which attempt to do without any rote learning may be depriving children (or adult learners!) of opportunities to build up some structures which are useful for rapid access -- unless the old formal methods are replaced with carefully structured play situations, to achieve the same effect (which they could probably do much more effectively, since they would be more highly motivating.) Children need a lot of practice at 'finding their way about' their own data-structures.
The structure of Figure 8.6 may look very complex, yet using it to answer certain questions requires simpler procedures than using, say Figure 8.2. For, having found the link representing a number, one can then find information associated with that number by simply following forward pointers from it, for example, using a procedure like ASSOC; whereas in Figure 8.2 or 8.5, finding the predecessor and successor of a number requires using two different procedures, and each requires a search down a chain of all the numbers to start with. Of course, a structure like Figure 8.6 provides simple and speedy access at the cost of using up much more storage space. But in the human mind space does not seem to be in short supply!
A further problem is that each time new information is added to a chain, the increased length increases the average time for searching along the chain. So if an item in a structure like Figure 6 has a very long chain of associations, it might be preferable to replace the linear chain with a local index to avoid long searches. So, instead of 'three' being linked to a linear list of associations, it would have some kind of structured catalogue. Someone who knew a very large number of things about 'three' might find that this saved time searching for information. This would require the procedure ASSOC to be replaced by something more sophisticated, and would probably also require more space. Alternatively, by switching pointers, one could easily bring a link to the front of the chain each time the association hanging from it is used: this would ensure that most recently and most frequently used information was found first, without the help of probabilistic mechanisms, often postulated to explain such phenomena.
(Perhaps only trivial things can be taught without generating a great deal of confusion. Infants learning to speak experience a great deal of confusion, but this does not usually make them give up! Only later on do we teach them to give up too soon, by labelling them as 'stupid', for example, or perhaps by helping them too often when they are in difficulty.)'
This kind of circularity (or mutual recursion) is especially common in our mental concepts. For instance, the concept of 'belief cannot be analysed without reference to the concepts of desire and decision, and these cannot be analysed without reference to each other and the concept of belief. Yet ordinary people learn to use these concepts in their ordinary life (for instance, when they explain someone's action in terms of a belief: 'He did it because he believed I was out to get him'). We learn to use mutually recursive concepts without being at all aware of their complexity.
In my experience philosophers and psychologists tend to get very confused about how to deal with this kind of circularity, for example in discussing varieties of Behaviourism. Analogies with recursive computer programs and data-structures can help to clarify the issue. One can distinguish varieties of behaviourism according to whether they will tolerate recursion (especially mutual recursion) in their definitions of mental concepts. Ryle's book The Concept of Mind was more sophisticated in this respect than most other forms of behaviourism, since it implicitly allowed mutually recursive definitions of mental concepts, implying that mentalistic concepts cannot all be eliminated by analysis in terms of dispositions to respond to stimuli. This, presumably, explains why Ryle did not see himself as a behaviourist.
The kind of structure depicted in Figure 6 does not need a separate index or catalogue specifying where to look for associations involving known items, for it acts as an index to itself, provided there are some ways of getting quickly from outside the structure to key nodes, like the cells containing 'three' and 'number'. (This might use an index, or content addressable store, or indexing tricks analogous to hash coding, for speedy access.)
The use of structures built up from linked cells and pointers like this has a number of additional interesting features, only a few of which can be mentioned here. Items can be added, deleted, or rearranged merely by changing a few addresses, without any need for advance reservation of large blocks of memory or massive shuffling around of information, as would be required if items were stored in blocks of adjacent locations (another trade-off: space against flexibility).
The same items can occur in different orders in different structures which share information (see Figure 8.4 for a simple example). Moreover, the order can be changed in one sequence without affecting another which shares structure with it. For instance, in Figure 8.4 the addresses in links W, X, Y, and Z can be changed so as to alter the order of numbers in chain labelled 'reverse' without altering the chain labelled 'forward'.
As we saw in connection with Figure 8.2, when the rest of the mechanism is taken for granted, a structure of the kind discussed here looks like a program for generating behaviour, but when one looks into problems of how a structure gets assembled and modified, how parts are accessed, how the different stopping conditions are applied, etc., then it looks more like an information structure used by other programs.
I have offered no explanation of the ability to answer 'How many?' questions by recognising a visual pattern, without explicit counting. Obviously, there is a lot to be said about the development of new perceptual abilities related to numbers, for example the ability to perceive groups of three objects without counting, by matching against a structural definition, much as one recognises arches, letters and horses (see chapter 9 and Winston, 1975).
Nothing has been said about how the child discovers general and non-contingent facts about counting, such as that the order in which objects are counted does not matter, rearranging the objects does not matter, the addition or removal or an object must change the result of counting, and so on. How does a child come to grasp the fact that in principle counting can go on indefinitely, so that its stock of number names may need to be extended, or replaced by a rule with unlimited generative power?
(Philosophers' discussions of such non-empirical learning are usually so vague and abstract as to beg most of the questions. Piagetian psychologists comment on some of the achievements, but provide no means of analysing or explaining the underlying mental processes discussed here.)
I cannot explain these and many more things that even primary school children learn. I do not believe that anybody has even the beginnings of explanations for most of the things we know they can (sometimes) do: all we find is new jargon for labelling the phenomena.
I have offered all this only as a tiny sample of the kind of exploration needed for developing our abilities to build theoretical models worth taking seriously. In the process our concept of mechanism will be extended and the superficiality of current problems, theories and experiments in psychology and educational technology will become apparent.
Philosophers have much to learn from this sort of exercise too, concerning old debates about the nature of mind, the nature of concepts and knowledge, varieties of inference, etc. Consider my short survey of answers they have given to the question 'What are numbers?' The answers do not begin to match the complexity of what a child has to grasp in learning about numbers. They do not account for the fact that number concepts are used in a variety of activities. They perhaps take the uses for granted, but make no attempt to explain how they are possible. Philosophers of the future, who have a much better grasp of what such explanations might look like, will be in a better position to formulate adequate analyses.
Similarly, when they have learnt about possible mechanisms underlying processes of inference and discovery, they will be in a better position to discuss the nature of mathematical discovery and other forms of a priori learning. The most that can be said at present is that it will probably prove helpful to think of mathematical discovery by analogy with a program which discovers new facts about itself by a combination of executing parts of itself and examining some of its instructions. In the process it might decide that some things could be done more quickly in a different way. Or it might discover, by analysing its own structure, that instead of executing bits of programs, it can work out their effects by reasoning about them.
More importantly, it may discover ways of generalising and extending its procedures to accomplish more tasks of the same sort, or new kinds of tasks. Programmers often discover unexpected ways of elaborating and generalising their programs, in the course of examining and using them, much as an artist learns more about what he can and should do by examining an incomplete work. A program which builds its own programs can do this too. Sussman's 'Hacker' program (1975) builds programs, and, in some cases, generalises them.
I believe that similar ideas are to be found in Piaget's writings. Computer models turn such thoughts from vague speculations to testable theories. See Young, 1976, for an example.
These sorts of second-order discoveries about one's own procedures do not fit the normal definition of 'empirical'. For example, they need not involve the use of the senses to gain information about the world. And a kind of necessity seems to be involved in the truths so discovered which is not normally thought to be compatible with empirical learning: if experience can lead us to a hypothesis can it not also produce a refutation of the hypothesis? But it seems that no experience can refute the claim that adding two lots of two things produces the same result as adding one thing to a group of three things, or the claim that there is no largest number. And the same is true of many other discoveries about properties of the procedures we use. Yet such mathematical discoveries involve a kind of exploration of possibilities which is closely analogous to empirical learning.
We need a richer set of distinctions than philosophers normally employ. There is learning from sensory experience and learning from symbolic experience. The latter seems to include the processes generating what Kant called 'synthetic a priori knowledge'. However these processes require a great deal of further investigation. In particular, it is important to note that symbolic experiences may occur either entirely within the mind, or else may use external symbolisms, as when we use diagrams or calculations on paper. The use of our senses to examine our symbols and our procedures for manipulating them should not be confused with the use of our senses to examine the behaviour of objects in the world.
The task of designing programs which simulate these sorts of human learning to a significant extent is at the frontiers of current research in artificial intelligence. Until further progress has been made, philosophical speculation about non-empirical knowledge is likely to remain as unproductive as it has been through most of history.
The old nature-nurture (heredity-environment) controversy is transformed by this sort of enquiry. The abilities required in order to make possible the kind of learning described here, for instance the ability to construct and manipulate stored symbols, build complex networks, use them to solve problems, analyse them to discover errors, modify them, etc., all these abilities are more complex and impressive than what is actually learnt about numbers! Where do these abilities come from? Could they conceivably be learnt during infancy without presupposing equally powerful symbolic abilities to make the learning possible? Maybe the much discussed ability to learn the grammar of natural languages (cf. Chomsky, 1965) is simply a special application of this more general ability? This question cannot be discussed usefully in our present ignorance about possible learning mechanisms.
Finally a question for educationalists. What would be the impact on primary schools if intending teachers were exposed to these problems and given some experience of trying to build and use models like Figure 6 on a computer? Our experience of teaching philosophy and psychology students computing in the Cognitive Studies Programme at Sussex University, and similar experiences at other centres, such as Edinburgh University and Massachusetts Institute of Technology, suggests that it can produce a major transformation of outlook including a new respect for the achievements of children. Here is a tremendous opportunity for educational administrators and teacher-training institutions. Will they grasp it?[*]
(2) See Pylyshyn 1978, Sloman 1978 (Commentary on Pylyshyn), in BBS.
If it had not been ignored, children would be learning how to design, implement, test, debug, analyse, describe, explain and criticise increasingly complex working systems. But their minds are not being trained to deal with all the complex systems they will encounter in real life, including social systems, political systems, economic systems, computing systems, and human minds, including their own. If it had not been ignored, people starting training to be psychologists would have had experience of building and testing systems that manipulate and use information structures far more complex than the one depicted in Figure 6.
I often use the phrase "mouse potato" by analogy with "couch potato" used to describe passive watchers of television programmes. The generation of couch potatoes who passively watched television programmes produced by others for hours on end is educating a generation of mouse potatoes who "passively use" software products produced by others.
Part of all this is the ability to understand the difference between the necessary, exceptionless, truths of number theory, such as that seven plus five equals 12, or that the cardinality of a set is independent of the order in which the elements are counted and merely contingent truths, such as that if you put one rabbit in an empty hutch and then later add another rabbit, and do not put any additional rabbits in the hutch there will be only two rabbits in the hutch thereafter.
In other words I was convinced that although the processes involved in
learning some of the basic features of the number system, including the
names for the numbers and their order, might use probabilistic
mechanisms (such as the neural nets that became popular in the two
decades following publication of this book),
this could not be the key either to the nature of our
mathematical knowledge, or to many other features of our knowledge of
the world, such as understanding how a clock works, or why turning a
handle can enable a door to be opened, or why it is necessary
to open the door in order to go into a room. When we understand these
things we do not merely understand probabilistic associations, we
understand structural relationships. (This theme is still being developed
on my web site, e.g. in these documents:
By the early 1970's there had already been some deep work in AI investigating structure-based learning and understanding, e.g. the papers in Minsky's 1968 collection, and Sussman. (Progress was very slow, however, because of the extremely limited speeds and memory capacities of computers of the time, but more importantly because the sheer difficulty of the problems.)
When I wrote this chapter, I was attempting to generalise some of that early work by exploring the hypothesis that a human's (mostly unconscious) ability:
I also suggested that if some of the list structures did not have a fixed order but were re-linked, e.g. bringing more recently accessed items closer to the front, then that could explain some of the variability in performance that others had assumed must be explained by probabilistic mechanisms.
In retrospect, it seems that a mixture of the probabilistic and deterministic approaches is required, within the study of architectures for complete agents: a more general study than the investigation of algorithms and representations that dominated most of the early work on AI (partly because of the dreadful limitations of speed and memory of even the most expensive and sophisticated computers available in the 1960s and 1970s).
There are many ways such hybrid mechanisms could be implemented, and my recent work on different processing layers within an integrated architecture (combining reactive, deliberative and meta-management layers) indicates some features of a hybrid system, with probabilistic associations dominating the reactive layer and structure manipulations being more important in the deliberative layer. For recent papers on this see
Added 8 Feb 2016
Minsky's paper on "causal diversity" is also relevant:
He points out that it is very easy to assume that the observed behaviours of animals suggest a unique interpretation, until we start exploring possible mechanisms that might produce those behaviours. He implicitly acknowledges that such mechanisms can be described at different levels of abstraction, not only at the level of brain physiology.
I am not sure that he understands all the requirements for (cardinal) number comprehension summarised in the note added Feb 2015 below.
The common trap of anthropomorphism is often a product of a lack of understanding of the variety of possible information processing architectures. Some of them are explored in these online presentations: http://www.cs.bham.ac.uk/research/projects/cogaff/talks/
Note expanded Feb 2016: Heike Wiese:
In 2009 at a Cognitive Science conference I learnt that ideas about number competences related to those presented here had been developed independently by Heike Wiese, who discusses the evolutionary role of language in developing counting competences.
Wiese emphasises the use of relational structures in language to represent and characterise relational structures of objects (e.g. cardinality, ordinality), whereas this chapter explains how internal computational processes, sometimes controlling external behaviours such as repeated pointing, or repeated transfer of objects from one location to another, can (in principle) explain multiple practical uses of relational structures, including linguistic and non-linguistic structures.
The evolution of human linguistic capabilities may have built on pre-existing mechanisms required for perception, for intention formation, for control of actions, etc. The use of those mechanisms as a basis for numerical competences seems at first sight to have built on prior use of the mechanisms for linguistic communication. But perhaps different evolutionary trajectories are possible, in which some numerical competences precede evolution of linguistic communication. (An unobvious example might be the use of iterated structures in a genome to control development of multiple components in an organism, e.g. the vertebrae -- one of many ways in which evolution made use of mathematical structures long before individual organisms did.)
Some speculations about evolution of internal and external languages related
to functions of vision, and other competences are presented in
Note added June 2015
I have been informed that similar ideas were developed by the psychologist Susan Carey, but have not yet looked closely at her work.
[[Note added in 2015:
Original pages 217-241
In the opening paragraphs of the Introduction to Critique of Pure Reason he made claims about perception and empirical knowledge which are very close to assumptions currently made by people working on artificial intelligence projects concerned with vision and language understanding. He suggested that all our empirical knowledge is made up of both 'what we receive through impressions' and of what 'our own faculty of knowledge supplies from itself. That is, perception is not a passive process of receiving information through the senses, but an active process of analysis and interpretation, in which 'schemata' control what happens. In particular, the understanding has to 'work up the raw material' by comparing representations, combining and separating them. He also points out that we may not be in a position to distinguish what we have added to the raw material, 'until with long practice of attention we have become skilled in separating it'. These ideas have recently been re-invented and elaborated by some psychologists (for example, Bartlett).
One way of trying to become skilled in separating the raw material from what we have added is to attempt to design a machine which can see. In so doing we learn that a great deal of prior knowledge has to be programmed into the machine before it can see even such simple things as squares, triangles, or blocks on a table. In particular, as Kant foresaw, such a machine has to use its knowledge in comparing its sense-data, combining them into larger wholes, separating them, describing them, and interpreting them as representing some other reality. (This seems to contradict some of the claims made by Ryle about perception, in his 1949, e.g. p. 229, last paragraph.)
[[Note added Jan 2007, extended Feb 2016
During 2005-6, while working on the CoSy robotic project I became increasingly aware that the ideas presented here and in several later papers were too much concerned with perception of multi-layered structures, ignoring perception of processes, especially concurrent perception of processes at different levels of abstraction, and also perception of possible processes and their limits. This was discussed in this presentation
Important additions occurred later, e.g. during development of the
begun in 2011, e.g. illustrated in this discussion of perception of
Seeing the significance in a collection of experimental results, grasping a character in a play or novel, and diagnosing an illness on the basis of a lot of ill-defined symptoms, all require this ability to make a 'Gestalt' emerge from a mass of information. A much simpler example is our ability to see something familiar in a picture like Figure 9.1. How does a 'Gestalt', a familiar word, emerge from all those dots?
Close analysis shows that this kind of ability is required even for ordinary visual perception and speech understanding, where we are totally unaware that we are interpreting untidy and ambiguous sense-data. In order to appreciate these unconscious achievements, try listening to very short extracts from tapes of human speech (about the length of a single word), or looking at manuscripts, landscapes, street scenes and domestic objects through a long narrow tube. Try looking at portions of Figure 7.1 through a hole about 3 mm in diameter in a sheet of paper laid on the figure and moved about. This helps to reveal how ambiguous and unclear the details are, even when you think they are clear and unambiguous. Boundaries are fuzzy, features indistinct, possible interpretations of parts of our sense-data indeterminate.
Perceived fragments require
a context for their interpretation. The trouble is that the context
usually consists of other equally ambiguous, incomplete, or possibly
even spurious fragments.
Sometimes our expectations provide an additional context, but this is not essential, since we can perceive and interpret totally unexpected things, like a picture seen on turning a page in a newspaper, or a sentence overheard on a bus.
Perception uses knowledge and expertise in different ways, clearly brought out by work on computer programs which interpret pictures. One of the most striking features of all this work is that it shows that very complex computational processes are required for what appeared previously to be very simple perceptual abilities, like seeing a block, or even seeing a straight line as a line. These processes make use of many different sorts of background knowledge, for instance in the following conscious and unconscious achievements:
So, perceiving structure or meaning may include using knowledge to reject what is irrelevant (like background noise, or coincidental juxtapositions) and to construct or hallucinate what is not there at all. It is an active constructive process which uses knowledge of the 'grammar' of sensory data, for instance knowledge of the possible structures of retinal images, knowledge about the kinds of things depicted or represented by such data, and knowledge about the processes by which objects generate sense-data. Kant's 'schemata' must incorporate all this.
We need not be aware that we possess or use such knowledge. As Kant noticed, it may be an 'art concealed in the depths of the human soul' (p. 183, in Kemp Smith's translation), much of it "compiled" into procedures and mechanisms appropriate to images formed by the kind of world we live in. But at present there are no better explanations of the possibility of perception than explanations in terms of intelligent processes using a vast store of prior information, much of which is "compiled" (by evolution or by individual learning) into procedures and mechanisms appropriate to images formed by the kind of world we live in.
For instance, theories according to which some perception is supposed to be 'direct', not involving any prior knowledge, nor the use of concepts, seem to be quite vacuous. A theory which claims that perceptual achievements are not decomposable into sub-processes cannot be used as a basis for designing a working mind which can perceive any of the things we perceive. It lacks explanatory power, because it lacks generative power. If the processes cannot be decomposed, then there is no way of generating the huge variety of human perceptual abilities from a relatively economical subsystem. By contrast, computational theories postulating the use of prior knowledge of structures and procedures can begin to explain some of the fine structure (see chapters 2 and 3) of perceptual processes, for example, the perception of this as belonging to that, this as going behind that, this as similar to that, this as representing that, and so on.
Quibbles about whether the ordinary word 'knowledge' is appropriate for talking about the mechanisms and the stored facts and procedures used in perception seem to be merely unproductive distractions. Even if the ordinary usage of the word 'knowledge' does not cover such inaccessible information, extending the usage would be justified by the important insights gained thereby. Alternatively, instead of talking about 'knowledge' we can talk about 'information' and say that even the simplest forms of perception not only provide new information, in doing so they make use of various kinds of prior information.
In a more complete discussion it would be necessary to follow Kant and try to distinguish the role of knowledge gained from previous perceptual experiences and the role of knowledge and abilities which are required for any sort of perceptual experience to get started. The latter cannot be empirical in the same sense, though it may be the result of millions of years of evolutionary "learning from experience".
Since our exploration of perceptual systems is still in a very primitive state, it is probably still too early to make any such distinctions with confidence. It would also be useful to distinguish general knowledge about a class of theoretically possible objects, situations, processes, etc., from specific knowledge about commonly occurring subsets. As remarked in chapter 2, we can distinguish knowledge about the form of the world from knowledge about its contents. Not all geometrically possible shapes are to be found amongst animals, for example. A bat may in some sense be half way between a mouse and a bird: but not all of the intervening space is filled with actually existing sorts of animals. If the known sorts of objects cluster into relatively discrete classes, then this means that knowledge of these classes can be used to short-circuit some of the more general processes of analysis and interpretation which would be possible. In information-theoretic terms this amounts to an increase of redundancy -- and a reduction of information -- in sensory data. This is like saying that if you know a lot of relatively commonly occurring words and phrases, then you may be able to use this knowledge to cut down the search for ways of interpreting everything you hear in terms of the most general grammatical and semantic rules. (Compare Becker on the 'phrasal lexicon'.) This is one of several ways in which the environment can be cognitively 'friendly' or 'unfriendly'. We evolved to cope with a relatively cognitively friendly environment.
In connection with pictures like Figure 1, this means that if you know about particular letter-shaped configurations of bars, then this knowledge may make it possible to find an interpretation of such a picture in terms of bars more rapidly than if only general bar-configuration knowledge were deployed. For instance, if you are dealing with our capital letters, then finding a vertical bar with a horizontal one growing to the left from its middle, is a very good reason for jumping to the conclusion that it is part of an 'H', which means that you can expect another vertical bar at the left end of the horizontal.
Thus a rational creature, concerned with maximising efficiency of perceptual processing, might find it useful to store a very large number of really quite redundant concepts, corresponding to commonly occurring substructures (phrases) which are useful discriminators and predictors.
The question of how different sorts of knowledge can most fruitfully interact is a focus of much current research in artificial intelligence. The strategies which work in a 'cognitively friendly world' where species of things cluster are highly fallible if unusual situations occur. Nevertheless the fallible, efficient procedures may be the most rational ones to adopt in a world where things change rapidly, and your enemies may not give you time to search for a conclusive demonstration that it is time to turn around and run. Thus much of the traditional philosophical discussion of rationality, in terms of what can be proved beyond doubt, is largely irrelevant to real life and the design of intelligent machines. But new problems of rationality emerge in their place, such as problems about trading space against time, efficiency against flexibility or generality, and so on. From the design standpoint, rationality is largely a matter of choosing among trade-offs in conditions of uncertainty, not a matter of getting things 'right', or selecting the 'best'. (For more on trade-offs see the chapters on representations, and on numbers: Chapter 7 and Chapter 8)).
Contrary to what many people (including some philosophers) have assumed, there need not be any similarity between what represents and what it represents. Instead, the process of interpretation may use a variety of interpretation rules, of which the most obvious would be rules based on information about a process of projection which generates, say, a two-dimensional image from a three-dimensional scene. (For more on this see the chapter on analogical representations.)
The projection of a three dimensional scene onto a two dimensional image is just a special case of a more general notion of evidence which is generated in a systematic way by that which explains it. A two-dimensional projection of a three-dimensional object bears very little similarity to the object. (Cf. Goodman, Languages of Art.) The interpretation procedure may allow for the effects of the transformations and distortions in the projection (as a scientist measuring the temperature of some liquid may allow for the fact that the thermometer cools the liquid).
This is an old idea: what is new in the work of A.I. is the detailed analysis of such transformations and interpretation procedures, and the adoption of new standards for the acceptability of an explanation: namely it must suffice to generate a working system, that is, a program which can use knowledge of the transformations to interpret pictures or the images produced by a television camera connected to the computer.
What we are conscious of seeing is the result of many layers of such interpretation, mostly unconscious, yet many of them are essentially similar in character to intellectual processes of which we are sometimes conscious. All this will be obvious to anyone familiar with recent work in theoretical linguistics.
So the familiar philosophical argument that we do not see things as they are, because our sense-organs may affect the information we receive, is invalid. For however much our sense-organs affect incoming data, we may still be able to interpret the data in terms of how things really are. But this requires the use of knowledge and inference procedures, as people trying to make computers see have discovered. Where does the background knowledge come from? Presumably a basis is provided genetically by what our species has learnt from millions of years of evolution. The rest has to be constructed, from infancy onwards, by individuals, with and without help, and mostly unconsciously.
Moreover, even colour contrasts can sometimes be hallucinated on the basis of context, as so-called 'illusory-contrasts' show. For an example see Figure 2.
Instead of physiological theories, we need 'computational
theories, that is, theories about processes in which symbolic
representations of data are constructed and manipulated. In
such processes, facts about part of an image are interpreted
by making inferences using context and background
knowledge. We must not be blinded by philosophical or
terminological prejudices which will not allow us to describe
unconscious processes as inferences, or, more generally, as 'mental processes'.
How is it done? In particular, what exactly is the knowledge required for various kinds of perception, and how do we mobilise it as needed? We cannot yet claim to have complete or even nearly complete explanations. But A.I. work on vision has made some significant progress, both in showing up the inadequacies of bad theories and sketching possible fragments of correct explanations.
Our present ignorance is not a matter of our not knowing which theory is correct, but of our not even knowing how to formulate theories sufficiently rich in explanatory power to be worth testing experimentally.
Attempting to program computers to simulate human achievements provides a powerful technique for finding inadequacies in our theories thereby stimulating the rapid development of new theory-building tools. In the process we are forced to re-examine some old philosophical and psychological problems. For a survey of some of this work, see the chapters on computer vision in Boden (1977). Winston (1975) also includes useful material, especially the sections by Winston, Waltz, and Minsky. The rest of this chapter illustrates some of the problems with reference to an ongoing computer project at Sussex University, which may be taken as representative.
People can cope quite well with these pictures even when there is a lot of positive and negative noise, and where further confusion is generated by overlaps between letters, and confusing juxtapositions. Some people have trouble at first, but after seeing one or two such pictures, they interpret new ones much more rapidly. The task of the program is to find familiar letters without wasting a lot of time investigating spurious interpretations of ambiguous fragments. It should 'home in on' the most plausible global interpretation fairly rapidly, just as people can.
Out of context, picture details are suggestive but highly ambiguous, as can be seen by looking at various parts of the picture through a small hole in a sheet of paper. Yet when we see them in context we apparently do not waste time exploring all the alternative interpretations. It is as if different ambiguous fragments somehow all 'communicate' with one another in parallel, to disambiguate one another.
Waltz (1975) showed how this sort of mutual disambiguation could be achieved by a program for interpreting line drawings representing a scene made up of blocks on a table, illuminated by a shadow-casting light. He gave his program prior knowledge of the possible interpretations of various sorts of picture junctions, all of which were ambiguous out of context. So the problem was to find a globally consistent interpretation of the whole picture. The program did surprisingly well on quite complex pictures. His method involved letting possible interpretations for a fragment be 'filtered out' when not consistent with any possible interpretations for neighbouring fragments.
[[Note added 2001:
since 1975 there have been huge developments in techniques for 'constraint propagation', including both hard and soft constraints. ]]
But the input to Waltz' program was a representation of a perfectly connected and noise-free line drawing. Coping with disconnected images which have more defects, requires more prior knowledge about the structure of images and scenes depicted, and more sophisticated computational mechanisms.
Which dots in Figure 1 should be grouped into collinear fragments? By looking closely at the picture, you should be able to discern many more collinear groups than you previously noticed. That is, there are some lines which 'stand out' and are used in building an interpretation of the picture, whereas others for which the picture contains evidence are not normally noticed. Once you have noticed that a certain line 'stands out', it is easy to look along it picking out all the dots which belong to it, even though some of them may be 'attracted' by other lines too.
But how do you decide which lines stand out without first noticing all the collinear groups of dots? Are all the collinear dot-strips noticed unconsciously? What does that mean? Is this any different from unconsciously noticing grammatical relationships which make a sentence intelligible?
When pictures are made up of large numbers of disconnected and untidy fragments, then the interpretation problem is compounded by the problem of deciding which fragments to link together to form larger significant wholes. This is the 'segmentation' or 'agglomeration' problem. As so often happens in the study of mental processes, we find a circularity: once a fragment has been interpreted this helps to determine the others with which it should be linked, and once appropriate links have been set up the larger fragment so formed becomes less ambiguous and easier to interpret. It can then function as a recognisable cue. (The same circularity is relevant to understanding speech.)
Thus, by using the fact that some fragments are fairly unambiguous, we get the process started. By using the fact that long stretches of relatively unambiguous fragments are unlikely to be spurious, the program can control further analysis and interpretations. Parallel pairs of bold lines are used as evidence for the presence of a bar. Many of the strategies used are highly fallible. They depend on assumption that the program inhabits a 'cognitively friendly' world, that is, that it will not be asked to interpret very messy, very confusing pictures. If it is, then, like people, it will become confused and start floundering.
Clusters of bar-like fragments found in this way can act as cues to generate further higher-level hypotheses, for example, letter hypotheses, which in turn control the interpretation of further ambiguous fragments. (For more details, see Sloman and Hardy 'Giving a computer gestalt experiences' and Sloman et al. 1978.) In order to give a program a more complete understanding of our concepts, we would need to embody it in a system that was able to move about in space and manipulate physical objects, as people do. This sort of thing is being done in other artificial intelligence research centres. However, there are still many unsolved problems. It will be a long time before the perceptual and physical skills of even a very young child can be simulated.
The general method of using relatively unambiguous fragments to activate prior knowledge which then directs attention fruitfully at more ambiguous fragments, seems to be required at all levels in a visual system. It is sometimes called the 'cue-schema' method, and seems to be frequently re-invented.
However, it raises serious problems, such as: how should an intelligent mechanism decide which schemas are worth storing in the first place, and how should it, when confronted with some cue, find the relevant knowledge in a huge memory store? (Compare chapter 8.) A variety of sophisticated indexing strategies may be required for the latter purpose. Another important problem is how to control the invocation of schemas when the picture includes cues for many different schemas.
Our program uses knowledge about many different kinds of objects and relationships, and runs several different sorts of processes in parallel, so that 'high-level' processes and (relatively) low-level' processes can help one another resolve ambiguities and reduce the amount of searching for consistent interpretations. It is also possible to suspend processes which are no longer useful, for example low-level analysis processes, looking for evidence of lines, may be terminated prematurely if some higher-level process has decided that enough has been learnt about the image to generate a useful interpretation.
This corresponds to the fact that we may recognise a whole (e.g. a word) without taking in all its parts. It is rational for an intelligent agent to organise things this way in a rapidly changing world where the ability to take quick decisions may be a matter of life and death.
Like people, the program can notice words and letters emerging out of the mess in pictures like Figure 1. As Kant says, the program has to work up the raw material by comparing representations, combining them, separating them, classifying them, describing their relationships, and so on. What Kant failed to do was describe such processes in detail.
The domains of knowledge involved include:
In particular the program has to know how to build and relate descriptions of structures in each of these domains, including fragments of structures. That is, the ability to solve problems about a domain requires an 'extension' of the domain to include possible fragments of well-formed objects in the domain Becker's 'phrasal lexicon' again. Our program uses many such intermediate concepts. Figures 3 and 4 list and illustrate some of the concepts relevant to the second and third domains. Figure 5 shows some of the cues that can help reduce the search for an interpretation. Figure 6 shows all the domains and some of the structural correspondences between items in those domains.
By making use of the notion of a series of domains, providing different
'layers' of interpretation, it is possible to contribute to the analysis
of the concept of 'seeing as', which has puzzled some philosophers.
Seeing X as Y is in general a matter of constructing a mapping between a
structure in one domain and a possibly different structure in another
domain. The mapping may use several intermediate layers.
[[Note added 2001:
our recent work on architectures containing a 'meta-management' layer suggests that being aware of seeing X as Y requires additional meta-management, i.e. self-monitoring processes, which are not essential for the basic processes of seeing X as Y, which could occur in simpler architectures, e.g. in animals that are not aware of their own mental processes (like most AI systems so far). ]]
Facts about one domain may help to solve problems about any of the others. For instance, lexical knowledge may lead to a guess that if the letters 'E', 'X' and T' have been found, with an unclear letter between them, then the unclear letter is 'I'. This in turn leads to the inference that there is a lamina depicting the 'I' in the scene. From that it follows that unoccluded edges of the lamina will be represented by lines in the hypothetical picture in domain (b). The inferred locations of these lines can lead to a hypothesis about which dots in the picture should be grouped together, and may even lead to the conclusion that some dots which are not there should be there.
The program, like a person, needs to know that a horizontal line-segment in its visual image can represent (part of) the top or bottom edge of a bar, that an ELL junction between line segments can depict part of either a junction between two bars or a corner of a single bar. In the former case it may depict either a concave or a convex corner, and, as always, context will have to be used to decide which.
The program does not need to define concepts of one domain in terms of concepts from another. Rather the different domains are defined by their own primitive concepts and relations. The notion of 'being represented by' is not the same as the notion of 'being defined in terms of'. For instance, 'bar' is not defined in terms of actual and possible sense-data in the dot-picture domain, as some reductionist philosophical theories of perception would have us believe. Concepts from each domain are defined implicitly for the program in terms of structural relations and inference rules, including interpretation strategies.
So the organisation of the program is more consistent with a dualist or pluralist and wholistic metaphysics than with an idealist or phenomenalist reduction of the external world to sense-data, or any form of philosophical atomism, such as Russell and Wittgenstein once espoused.
[[Note added 2001:
In the decades since this book was written many more learning methods have been developed for vision and other aspects of intelligence, though surprisingly few of them seem to involve the ability to learn about different classes of structures in domains linked by representation relationships. Many of them attempt to deal with fairly direct mappings between configurations detectable in image sequences and abstract concepts like "person walking". For examples see journals and conference proceedings on machine vision, pattern recognition, and machine learning.]]
Currently our program starts with knowledge which has been given it by people (just as people have to start with knowledge acquired through a lengthy process of biological evolution). Perhaps, one day, some of the knowledge will be acquired by a machine itself, interacting with the world, if a television camera and mechanical arm are connected to the computer, as is already done in some A.I. research laboratories. However, real learning requires much more sophisticated programs than programs which have a fixed collection of built-in abilities. (Some of the problems of learning were discussed previously in Chapter 6 and Chapter 8.)
Likewise in depictions of forest scenes, commonly occurring configurations in the dotty picture domain would be different from those found in Figure 1.
Knowledge of commonly occurring sub-structures in images, corresponding to particular domains represented, like knowledge about the objects represented, can help the interpretation process. This is analogous to processes in language-understanding in which knowledge of familiar phrases is combined with knowledge of a general grammar which subsumes those phrases. (Becker 1975)
Given structural definitions of letters, and knowledge of the relations between the different domains illustrated in Figure 6, a program might be able to work out or learn from experience that certain kinds of bar junctions (Figure 4), or the corresponding 2-D line configurations (Figures 3 and 5), occur only in a few of them, and thus are useful disambiguating cues. This will not be true of all the fragments visible in Figure 1. Thus many fragments will not be recognised as familiar, and spurious linkages and hypotheses will therefore not be generated. If the program were familiar with a different world, in which other fragments were significant, then it might he more easily be confused by Figure 1. So additional knowledge is not always helpful. (Many works of art seem to require such interactions between different domains of knowledge.)
A program should also be able to 'learn' that certain kinds of fragments do not occur in any known letter, so that if they seem to emerge at any stage this will indicate that picture fragments have been wrongly linked together. This helps to eliminate fruitless searches for possible interpretations. So the discovery of anomalies and impossibilities may play an important role in the development of rational behaviour. A still more elaborate kind of learning would involve discovering that whether a fragment is illegitimate depends on the context. Fragments which are permissible within one alphabet may not be permissible in another. Thus the process of recognising letters is facilitated by knowledge of the alphabet involved, yet some letter recognition may be required for the type of alphabet to be inferred: another example of the kind of circularity, or mutual dependence, of sub-abilities in an intelligent system.
NOTE [13 Jan 2007; Clarified 1 Jul 2015]:
The original diagram in Figure 6 suggested that all information flows upwards. That is not how the program worked: there was a mixture of bottom-up, top-down and middle-out processing, and the original arrows in the figure showing information flow have been replaced with bi-directional arrows to indicate this.
Hypotheses about style must, of course, be used with caution, since individual parts of a picture need not conform to the overall style. Local picture evidence can over-ride global strategies based on the inferred style provided that the program can operate in a mode in which it watches out for evidence conflicting with some of its general current assumptions, using monitors of the sorts described in Chapter 6.
This parallelism is required partly because, with a large amount of information available for analysis and interpretation, it may not be easy to decide what to do next, for example, which configurations to look for in the picture, and where to look for them. Deciding between such alternatives itself requires analysis and interpretation of evidence and at first it will not be obvious where the important clues are, nor what they are. So initially many on-going processes are allowed to coexist, until items both unambiguous and relatively important emerge, such as a long line, an unambiguous clue to the location of a bar, some aspect of the style, or a set of linked bar fragments which uniquely identify a letter.
When fragments forming clear-cut cues emerge, they can invoke a 'higher-level' schema which takes control of processing for a while, interrupting the 'blind' searching for evidence, by directing attention to suitable parts of the picture and relevant questions.
If higher level processes form a plausible hypothesis, this may suppress further analysis of details by lower level processes. For instance, recognition of fragments of 'E', or 'X', and of "I", where there appear to be only about four letters, might cause a program (or person) to jump to the conclusion that the word is 'EXIT', and if this fits into the context, further examination of lines to check out on remaining strokes of letters, and the missing 1', might then be abandoned. This ability to jump to conclusions on the basis of partial analysis may be essential to coping with a rapidly changing world. However it depends on the existence of a fair amount of redundancy in the sensory data: that is, it assumes a relatively 'friendly' (in the sense defined previously) world. It also requires an architecture able to support multiple concurrent processes and the ability for some of them to be aborted by others when their activities are no longer needed.
This type of programming involves viewing perception as the outcome of very large numbers of interacting processes of analysis, comparison, synthesis, interpretation, and hypothesis-testing, most, if not all, unconscious. On this view the introspective certainty that perception and recognition are 'direct', 'unmediated' and involve no analysis, is merely a delusion. (This point is elaborated in the papers by Weir -- see Bibliography.)
This schizophrenic view of the human mind raises in a new context the old problem: what do we mean by saying that consciousness is 'unitary' or that a person has one mind? The computational approach to this problem is to ask: how can processes be so related that all the myriad sub-tasks may be sensibly co-ordinated under the control of a single goal, for instance the goal of finding the word in a spotty picture, or a robot's goal of using sensory information from a camera to guide it as it walks across a room to pick up a spanner? See also Chapter 6 and Chapter 10.
[[Note added 2001:
At the time the program was being developed, we had some difficulty communicating our ideas about the importance of parallel processing concerned with different domains because AI researchers tended to assume we were merely repeating the well-known points made in the early 1970s by Winograd, Guzman and others in the MIT AI Lab, about "heterarchic" as opposed to "hierarchic" processing.
Heterarchic systems, dealt, as ours did, with different domains of structures and relations between them (e.g. Winograd's PhD thesis dealt with morphology, syntax, semantics and a domain of three dimensional objects on a table).
Both models involve mixtures of data-driven (bottom-up) and hypothesis-driven (top-down) processes.
Both allow interleaving of processes dealing with the different domains -- unlike hierarchic or pass-oriented mechanisms which first attempt to complete processing in one domain then pass the results to mechanisms dealing with another domain, as in a processing pipeline.
The main differences between heterarchy and our model were as follows:
The POPEYE architecture was designed to overcome these restrictions by allowing processing to occur concurrently in different domains with priority mechanisms in different domains determining which sub-processes could dominate scarce resources. Priorities could change, and attention within a domain could therefore be switched, as a result of arrival of new information that was not explicitly asked for.
- In an implementation of "heterarchic" processing there is typically only one locus of control at any time. Thus processing might be going on in a low level sub-system or in a high level sub-system, but not both in parallel with information flowing between them.
- In those systems decisions to transfer control between sub-systems were all taken explicitly by processes that decided they needed information from another system: e.g. a syntactic analyser could decide to invoke a semantic procedure to help with syntactic disambiguation, and a semantic procedure could invoke a syntactic analyser to suggest alternative parses.
- In that sort of heterarchic system it is not possible for a process working in D1 to be interrupted by the arrival of new information relevant to the current sub-task, derived from processing in D2.
- Consequently, if a process in that sort heterarchic system gets stuck in a blind-alley and does not notice this fact it may remain stuck forever.
In this respect the POPEYE architecture had something in common with neural networks in which information flows between concurrently processing sub-systems (usually with simulated concurrency). Indeed, a neural net with suitable symbol-manipulating sub-systems could be used to implement something like the POPEYE architecture, though we never attempted to do this for the whole system. After this chapter was written, work was done on implementing the top level word-recognizer in POPEYE as a neural net to which the partial results from lower level systems could be fed as they became available. ]]
In the light of this new appreciation of the extent of our ignorance about perceptual processes, we can see that much philosophical discussion hitherto, in epistemology, philosophy of mind, and aesthetics, has been based on enormous over-simplifications. With hindsight much of what philosophers have written about perception seems shallow and lacking in explanatory power. But perhaps it was a necessary part of the process of cultural evolution which led us to our present standpoint.
Another consequence of delving into attempts to give computers even very simple abilities is that one acquires enormous respect for the achievements of very young children, many other animals, and even insects. How does a bee manage to . land on a flower without crashing into it?
Many different aspects of perception are being investigated in artificial intelligence laboratories. Programs are being written or have been written which analyse and interpret the following sorts of pictures or images, which people cope with easily.
Some of the programs are in systems which control the actions of artificial arms, or the movements of vehicles. The best way to keep up with this work is to read journal articles, conference reports, and privately circulated departmental reports. Text-books rapidly grow out of date. (This would not be so much of a problem if we all communicated via a network of computers and dispensed with books! But that will not come for some time.)
Each of the programs tackles only a tiny fragment of what people and animals can do. For example, the more complex the world the program deals with the less of its visible structure is perceived and used by the program. The POPEYE program deals with a very simple world because we wanted it to have a fairly full grasp of its structure (though even that is proving harder than we anticipated). One of the major obstacles to progress at present is the small number of memory locations existing computers contain, compared with the human brain. But a more important obstacle is the difficulty of articulating and codifying all the different kinds of structural and procedural knowledge required for effective visual perception. There is no reason to assume that these obstacles are insuperable in principle, though it is important not to make extravagant claims about work done so far. For example, I do not believe that the progress of computer vision work by the end of this century will be adequate for the design of domestic robots, able to do household chores like washing dishes, changing nappies on babies, mopping up spilt milk, etc. So, for some time to come we shall be dependent on simpler, much more specialised machines.
These discrepancies are not directly attributable to the fact that computers are not made of neurons, or that they function in an essentially serial or digital fashion, or that they do not have biological origins. Rather they arise mainly from huge differences in the amount and organisation of practical and theoretical knowledge, and the presence in people of a whole variety of computational processes to do with motives and emotions which have so far hardly been explored.
A favourite game among philosophers and some 'humanistic' psychologists is to list things computers cannot do. (See the book by Dreyfus for a splendid example.) However, any sensible worker in artificial intelligence will also spend a significant amount of time listing things computers cannot do yet! The difference is that the one is expressing a prejudice about the limitations of computers, whereas the other (although equally prejudiced in the other direction, perhaps) is doing something more constructive: trying to find out exactly what it is about existing programs that prevents them doing such things, with a view to trying to extend and improve them. This is more constructive because it leads to advances in computing, and it also leads to a deeper analysis of the human and animal abilities under investigation.
As suggested previously in Chapter 5, attempting to prove that computers cannot do this or that is a pointless exercise since the range of abilities of computers, programming languages and programs is constantly being extended, and nobody has any formal characterisation of the nature of that process which could serve as a basis for establishing its limits. The incompleteness and unsolvability theorems of Goedel and others refer only to limitations of narrowly restricted closed systems, which are quite unlike both people and artificial intelligence programs which communicate with the world.
This chapter has presented a few fragments from the large and growing collection of ideas and problems arising out of A.I. work on vision. I have begun to indicate some of the connections with philosophical issues, but there is a lot more to be said. The next.chapter develops some of the points of contact at greater length.
A consequence of the parallelism and the bi-directionality of information flow was that the program could often conclude that the word, or some higher level structure, had been identified before all processing of the evidence had been completed. Sometimes that identification was mistaken e.g. because the addition of positive and negative noise, or overlap of letters, had obscured some of the evidence, and further processing would reveal the error. This seems to reflect the fact that humans sometimes think they have recognized someone or something (and may then greet the person) and soon after that realise, with the person out of sight, that the recognition was mistaken, presumably because more of the links relating data and interpretation fragments have been computed. This familiar feature of human vision, and related errors of proof-reading text, were among the motivations for the design of Popeye. ]]
Marr's ideas about mistakes in AI vision research were originally published in MIT technical reports that were widely circulated in the mid 1970s. He died, tragically, in 1981, and the following year his magnum opus was published: D. Marr, Vision, 1982, Freeman, 1982.
(b) Marr's criticism of AI vision research was based in part on the claim that natural images are far richer in information and if only visual systems took account of that information they would not need such sophisticated bi-directional processing architectures. My own riposte at the time (also made by some other researchers) was:
(c) In the late 1970s there was also growing support for a view also inspired in part by Marr's work, namely, that symbol manipulating mechanisms and processes of the sorts described in this chapter and elsewhere in this book were not really necessary, as everything could be achieved by emergent features of collections of 'local cooperating processes' such as neural nets.
Neural nets became increasingly popular in the following years, and they have had many successful applications, though it is not clear that their achievements have matched the expectations of their proponents. Work on neural nets and other learning or self-organising systems, including the more recent work on evolutionary computation, is often (though not always) driven by a desire to avoid the need to understand a problem and design a solution: the hope is that some automatic method will make the labour unnecessary. My own experience suggests that until people have actually solved some of these problems themselves they will not know what sort of learning mechanism or self-organising system is capable of solving them. However, when we have done the analysis required to design the appropriate specialised learning mechanisms we may nevertheless find that the products of such mechanisms are beyond our comprehension. E.g. the visual ontology induced by a self-organising perceptual system that we have designed may be incomprehensible to us.
What I am criticising is not the search for learning systems, or self-organising systems, but the search for general-purpose automatic learning mechanisms equally applicable to all sorts of problems. Different domains require different sorts of learning processes, e.g. learning to walk, learning to see, learning to read text, learning to read music, learning to talk, learning a first language, learning a second language, learning arithmetic, learning meta-mathematics, learning quantum mechanics, learning to play the violin, learning to do ballet, etc. In some cases the learning requires a specific architecture to be set up within which the learning can occur. In some cases specific forms of representation are required, and mechanisms for manipulating them. In some cases specific forms of interaction with the environment are required for checking out partial learning and driving further learning. And so on.
(d) At the time when the Popeye project was cancelled for lack of funds, work was in progress to add a neural net-like subsystem to help with the higher levels of recognition in our pictures of jumbled letters. I.e. after several layers of interpretation had been operating on an image like Figure 1, a hypothesis might begin to emerge concerning the letter sequence in the second domain from the top. In the original Popeye program a technique analogous to spelling correction was used to find likely candidates and order them, which could, in turn, trigger top-down influences to check out specific ambiguities or look for confirming evidence. This spelling checker mechanism was replaced by a neural net which could be trained on a collection of known words and then take a half-baked letter sequence and suggest the most likely word. (This work was done by Geoffrey Hinton, who was then a member of the Popeye project, and later went on to be one of the leaders in the field of neural nets.)
(e) Despite the excellence of much of Marr's research (e.g. on the cerebellum) I believe that AI research on vision was dealt a serious body blow by the publication of his views, along with the fast growing popularity of neural nets designed to work independently of more conventional AI mechanisms, and likewise later work on statistical or self-organising systems, motivated in part by the vain hope that by writing programs that learn for themselves or evolve automatically, we can avoid the need to understand, design and implement complex visual architectures like those produced by millions of years of evolution.
Certainly no matter what kinds of high level percept a multi-layer interpretation system of the sort described in this chapter produces, it is possible to mimic some of its behaviour by using probabilistic or statistical mechanism to discover correlations between low level input configurations and the high level descriptions. This is particularly easy where the scenes involve isolated objects, or very few objects, with not much variation in the arrangements of objects, and little or no occlusion of one object by another.
The problem is that in real life, including many practical applications, input images very often depict cluttered scenes with a wide variety of possible objects in a wide variety of possible configurations. If the image projection and interpretation process involves several intermediate layers, as in figure 6 above, each with a rich variety of permitted structures, and complex structural relations between the layers, the combinatorics of the mapping between input images and high level percepts can become completely intractable, especially if motion is also allowed and some objects are flexible. One way of achieving tractability is to decompose the problem into tractable sub-problems whose solutions can interact possibly aided by background knowledge. This seems to me to require going back to some of the approaches to vision that were being pursued in the 1970s including approaches involving the construction and analysis of structural descriptions of intermediate configurations. The computer power available for this research in the 1970s was a major factor in limiting success of that approach: if it takes 20 minutes simply to find the edges in an image of a cup and saucer there are strong pressures to find short cuts, even if they don't generalise.
(f) The growing concern in the late 1970s and early 1980s for efficiency, discouraged the use of powerful AI programming languages like Lisp and Pop-11, and encouraged the use of lower level batch-compiled languages like Pascal and C and later C++. These languages were not as good as AI languages for expressing complex operations involving structural descriptions, pattern matching and searching, especially without automatic garbage collection facilities. They are also not nearly as flexible in permitting task-specific syntactic extensions as AI languages, which allow the features of different problems to be expressed in different formalisms within the same larger program. Moreover AI languages with interpreters or incremental compilers provide far better support support for interactive exploration of complex domains where the algorithms and representations required cannot be specified in advance of the programming effort, and where obscure conceptual bugs often require interactive exploration of a running system.
However, the emphasis on efficiency and portability pressurised researchers to use the non-AI languages, and this subtly pushed them into focusing on problems that their tools could handle, alas.
Robin Popplestone (the original inventor of Pop2) once said to me that he thought the rise in popularity of C had killed off research in the real problems of vision. That may be a slight exaggeration.
(g) For a counter example to the above developments see Shimon Ullman, High-level vision: Object recognition and visual cognition, MIT Press, 1996. I have the impression that there may now be a growing collection of AI vision researchers who are dissatisfied with the narrow focus and limited applicability of many machine vision projects, and would welcome a move back to the more ambitious earlier projects, building on what has been learnt in recent years where appropriate. This impression was reinforced by comments made to me by several researchers at the September 2001 conference of the British Machine Vision Association.
(h) Besides the obvious limitations due to use of artificially generated images with only binary pixel values, there were many serious limitations in the Popeye project, including the restriction to objects with straight edges, the lack of any motion perception, and the lack of any perception of 3-D structure and relationships (apart from the partial depth ordering in the 2-D lamina domain). Our defence against the criticism of over-simplification was that we thought some of the architectural issues relevant to processing more complex images or image sequences, dealing with more complex environments, could usefully be addressed in an exploration of our artificial domain, if only by producing a "proof of principle", demonstrating how cooperative processes dealing with different domains could cooperate to produce an interpretation without time-consuming search.
(i) In the 20 years following the Popeye project (and this book) I gradually became aware of more serious, flaws, as follows.
Some of these are evolutionarily very old mechanisms shared with many animals. Others use much newer architectural layers, and possibly functions and mechanisms unique to humans.
This point was already implicit in my discussion of the overall architecture with its multiple functions in Chapter 6, e.g. in connection with monitors.
Some of these responses were external and some internal, e.g. blinking, saccadic eye movements, posture control, and some internal emotional changes such apprehension, sexual interest, curiosity, etc.
This use of perceptual systems seems to be important both in innate reflexes and in many learnt skills for instance athletic skills.
Of course, when I started work on this project I already knew about reflexes and trained high speed responses, as did everyone else: I simply did not see their significance for a visual architecture (though I had read J.J.Gibson's book The senses considered as perceptual systems, which made the point.)
Later this idea became central to development of the theory about a multi-layer architecture, mentioned above, in which reactive and deliberative processes run in parallel often starting from the same sensory input. This theme is still being developed in papers in the Cogaff project.
The need to see what is and is not possible, in addition to what is actually there, has profound implications for the types of information representations used within the visual system: structural descriptions will not suffice. Several papers on this are included in the Cogaff web site, some mentioned below.
The last critique was inspired by J.J.Gibson's notion of "affordance". See for example his book, The Ecological Approach to Visual Perception originally published in 1979. Although I rejected some of his theories (e.g. the theory that perception could somehow be direct, and representation free) the theory that vision was about detecting affordances seemed very important. I.e. much of what vision (and perception in general) is about is not just provision of information about what is actually in the environment, but, more importantly, information about what sorts of things are possible in a particular environment that might be useful or harmful to the viewer, and what the constraints on such possibilities are.
Although I think very little progress has been made on this topic, several of my papers explored aspects of this idea, e.g.
(j) The Edinburgh AI language Pop2 mentioned above later
evolved into Pop-11, which became the
core of the
system developed at Sussex University and marketed for several years by
ISL, who contributed further developments.
It is now available free of charge with full system sources for a
variety of platforms here:
materials supporting teaching and research on vision,
developed by David Young at Sussex University.
Original pages 242-271
Not all the philosophical problems I shall be referring to are of the form 'How is X possible?' But the first one is: namely how is it possible for there to be a distinction between conscious and unconscious mental processes' Alternatively, how is it possible for some, but not all, of the contents of our minds to enter into our conscious experience? This topic will be discussed at some length, after which a collection of loosely related problems will be touched on.
I shall not go into such a detailed analysis now. But I want to say something -- not about the most general sense of the word 'conscious', which includes usages like 'I've been conscious for several months that I am likely to lose my job soon', which refers to some knowledge or belief -- but about the kind of distinction we make between things that we are currently conscious of and things we are not, especially things in our own minds. I want to try to relate this distinction to some computational considerations.
It is obvious that besides conscious mental processes there are unconscious or subconscious ones, such as the decisions about gear changes, steering and so on taken by an experienced car driver, the recognition of syntactic structure in understanding spoken and written language, and the detailed analysis and interpretation processes involved in perceiving a complex scene or picture. (Chapter 9.) Moreover, what a learner is painfully conscious of may later be handled unconsciously -- like gear-changing while driving a car, or using grammatical constructs in a second language. So there need be no difference in the content of conscious and unconscious processes.
Although it is obvious that there is a difference, it is very difficult to analyse this difference between what we are and are not conscious of, especially as there are so many borderline cases -- like finding something odd without being aware of what is odd about it. Were you previously conscious of the fact that you were reading print arranged in horizontal lines or was it unconsciously taken for granted? How is this different from being conscious of the lines of print? Is a sleep-walker who clearly opens a door in order to go through, conscious of the door and aware that it is shut? Is he conscious that he is opening it? While reading a gripping story you may be very conscious of what is going on in the story, but hardly aware of what is on the page. A good quick reader is conscious of some of what is on the page, but not necessarily all the letters composing words he reads. And he may be too engrossed in what he is reading to be conscious of the fact that he is reading.
For the past few minutes you have probably been conscious of the fact that you were reading, but were you also conscious of being conscious of it? And were you conscious of that too? How far are you prepared to go in saying that you are conscious of being conscious of being conscious of . . . etc.?
That was merely a reminder that what may at first seem to be a clear and obvious distinction is often very slippery when looked at closely a typical philosopher's delight! Do not be misled by rhetorical invitations to grasp the essence of consciousness, or experience, or mind, by examining your own current awareness. Introspection is not as easy or informative as some think!
But there is a distinction, however slippery it may be. So we can ask questions like: what is it for? How does it come about that we are conscious of some of our mental states and processes, but not others? What is special about the former? Would we have any need to build in such a distinction if we were designing a person, or an intelligent robot? What are the preconditions for such a distinction to arise in a complex information-processing system?
If, as suggested in Chapter 6, we can make a distinction between relatively central administrative processes and the rest, then perhaps we can use this as a basis for analysing the distinction between what the system is conscious of and what it is not, roughly as follows:
What the system is currently conscious of includes all the information available to the central decision-making processes, whether or not decisions are actually influenced as a result. The system would be self-conscious to the extent that the information available to these processes included information about the system itself e.g. information about its location, its current actions, its unfulfilled purposes, or even about what it is currently conscious of! (Compare Minsky's 'Matter Mind and Models'.)Let us try to clarify this a little, recapitulating some points from chapter 6. The central processes are those which, among other things:
There would not be so great a need for any such centralised process if there were not the possibility of conflicts. The body cannot be in two places at once, the eyes cannot look in two opposed directions at once, and there are limited computational resources, so that expensive processes cannot all run simultaneously (e.g. if one of the main information work-spaces has a small capacity). There might also be conflicts of a more subtle sort, for example conflicts between different ways of interpreting some information which is not at present relevant to any on-going activity, but which might be. In all these cases, sub-processes will generate conflicting goals, plans and strategies, and so there must be some means of resolving the conflict, taking into account the needs of the whole system (the need to avoid serious injury, the need for food, the need for well-organised catalogues and information stores, the need to go on collecting information which might be useful sometime, the need to develop new abilities and improve old ones, and so on).
The need for global decision-making processes would be further reduced if the system were less flexible, that is, if it were not possible to change the nature and aims of different sub-processes. Where a complex system has a relatively fixed structure, there will be no need for decisions about what the structure should be!
What I have called 'central' processes need not be located centrally in a physical sense: indeed, for reasons given in chapter 5 and elsewhere, they need not have any specific physical location. For example, in a nation where all citizens vote on every major policy decision, everybody is part of the central process.
Further, the central processes need not all be under the control of some single program: the central administrator may itself be simply a collection of sub-processes using certain stores of information, but changing in character and strategy from time to time, like the political party in power. Its function in the total system is what defines the central process, or collection of processes.
If lots of separate sub-systems could happily co-exist without any conflicts, and without any need for or possibility of a co-ordinated division of labour, then there would not be a role for any kind of centralised decision-making. Alison Sloman informs me that there are several kinds of organisms which live together in co-operative colonies, but which do not need the sort of global decision-making I am talking about. Coral is an example. If, like most plants, such a colony cannot move or has no control over its movements, or if which way it moves does not matter, then there cannot be conflicts about which way it should move. If a system does not have eyes, then there cannot be conflicts or decisions about which way it should look. This suggests that the evolution of organisms with a distinction between conscious and unconscious processes may be closely related to the evolution of forms of symbiosis and co-operation in complex tasks, and the differentiation of functions.
(This line of thought also suggests that it may be possible to make a distinction between what a human social system is and is not conscious of, if it is a relatively integrated system. Of course, we must not expect the distinction to be any less blurred and slippery than it is when applied to individual people.)
So, perhaps the distinction between what we are and are not conscious of at a particular time, is concerned with the difference between information which is made available to. or used by, central administrative processes, and information which is not. There will be many processes which continue without any notice being taken of them by the central administrator, and at each moment there is an enormous amount of unused information present in stores of various kinds. There is no point cluttering up the central decision-making with all the details of the sub-processes: the task of relating all the information would be too unmanageable. So censorship of a sort is a prerequisite for normal functioning of such a system, rather than an oddity to be explained. (This principle is integral to the design of the POPEYE program described in chapter 9.)
[[Note added Sept 2001:
In the years following publication of this book many researchers have attempted to avoid the need for any kind of central administrative mechanism by postulating networks of cooperative and competing mechanisms through which global decisions and behaviour can emerge. Typically this requires the notion of some sort of common currency in terms of which the relative importance of different needs and goals and plans can be evaluated by local comparisons, and possibly some sort of voting scheme for combining the preferences of different components of the system.
Despite the popularity of such ideas I suspect they are appropriate only to problems where there is no possibility of a well structured solution based on a clear understanding of the different sub-goals, their relationships, the options for action, the possibilities for compromise or for optimal sequencing. Where attempts are made to base decision-making entirely on numerical computations, e.g. using probabilities and utilities, it often turns out (in AI and in government procedures) that reliance only on numerical processes loses much information, by comparison with descriptive methods. A consequence is that good solutions cannot be found except in simple cases.
The idea of a high level unitary decision-making process for resolving conflicts on the basis of a global viewpoint is often re-invented. E.g. See P.N. Johnson-Laird, The Computer and the Mind: An Introduction to Cognitive Science, 1993 (2nd Edition). He draws an unfortunate analogy with operating systems, unfortunate (a) because typically operating systems are concerned with huge amounts of low level management in addition to the more central global decision making, and (b) because an operating system can often become subservient to a more intelligent program running within the operating system, e.g. AI programs controlling a robot. ]]
It is possible for perceptual sub-processes which do not influence the central processes at all, to produce modifications of the store of beliefs, and help to control the execution of other sub-processes. They may even influence the central processes at some later stage -- a possibility taken for granted by advertisers and propagandists. This amounts to a form of unconscious perception, differing from conscious perception only in its relationship to the central processes. So from the present viewpoint, the existence of unconscious mental processes is in no way puzzling.
We can become conscious of some, but not all, of the things in our minds of which we are not conscious. Much of the information which is not accessed by central processes could be if required. There are all sorts of things in your memory, of which you are currently not conscious (though if asked you might say you have been aware of them for several years!), but which you could become conscious of if you needed the information.
The same is true of much of the information processed by our senses: you may become conscious of the humming noise in the background which you previously did not notice, because someone draws your attention to it, or because it stops, or even because you simply decide to listen to your surroundings. However, some things are not accessible. Why not?
There are several different sorts of reasons why information about a complex system may be inaccessible to the central processes. Here are some, which might not occur to someone not familiar with programming.
(I believe that much of what Marxists refer to as 'false consciousness', like the inability of people to see themselves as exploited, can be accounted for in terms of a lack of some of the analytical and interpretative concepts required. What needs to be explained, then, is not why people are not conscious of such facts, but how it is possible for them ever to learn the concepts which can make them conscious.)
Much of what we do may involve such rapidly re-used storage so that if asked about details shortly after doing things we cannot recall exactly what happened. Perhaps the activities of a sleep-walker who seems to be fully conscious while walking about also use such temporary storage space for records which would normally be linked to more enduring structures. (None of this presupposes that there is any physical difference between the permanent and the temporary storage locations, nor in the mechanisms for accessing them. It may even be possible for 'permanent' records to be obliterated and the space re-claimed for temporary storage! A lot depends on the storage medium, about which very little is known in the case of humans.)
What I have been driving at is that what is hardest to explain is not why some things are inaccessible, but how things ever become accessible to central processes. We do not need to postulate mechanisms for preventing things becoming conscious: mere lack of a mechanism, or activity, may explain that. However there may be explicit suppression or censorship too.
We have already seen that there is good reason for arranging that only a subset of all goings-on be reported centrally. So sub-processes may have explicit instructions about what to report and what not to report. Moreover, it is necessary for these instructions to be modifiable in the light of current needs and expectations. So the central administrator may have some control over what gets reported to it. Thus there is plenty of scope for it to give explicit instructions preventing certain categories of information being recorded, or reported to globally accessible stores.
So some items may be inaccessible as a direct result of policy decisions within the system (as Freud suggested). Records of these policy decisions may themselves be inaccessible! (Many of these points will be quite obvious to administrators, both corrupt and honest.) Further study of this topic should illuminate various sorts of human phenomena, desirable and undesirable.
I have already warned against the assumption that there is necessarily a unique continuing process with the centralised decision-making role. There might be a number of relatively self-contained sub-processes which gain control at different times. If they each have separate memory stores (as well as having access to some shared memory), then we can expect schizophrenic behaviour from the system. Perhaps this is the normal state of a human being, so that, for example, different kinds of central processes, with different skills, are in control during sleeping and waking, or in different social settings.
Maybe only a subset of what constitutes a central administrator changes during such switches, for instance, a subset of the motivational store and a subset of the factual and procedural memory. Then personality has only partial continuity.
It is possible (as I believe Leibniz claimed) that instead of there being one division between what is and is not conscious in a complex system, there may be many divisions one for the system as a whole, and more for various sub-systems. If there is something in the argument about the need for some centralised decision-making in the system as a whole, then the same argument can be used for the more complex sub-systems: considered as an organic whole, there may be some things a sub-system can be said to be conscious of, and others which it cannot.
This would be clearest in a computer which controlled a whole lot of robot-bodies with which it communicated by radio. For each individual robot, there might be a fairly well-integrated sub-system, aware of where the robot is, what is going on around it, exactly what it is doing, and so on. Within it there will be sub-processes and information-stores of which it is not conscious, for the reasons already given (and no doubt others). Similarly within the total system, composed of many robots, there will be some kind of centralised process which is not concerned with all the fiddly details of each robot, but which knows roughly where each one is, knows which tasks it is performing, and so on. It may be capable of attending closely to the things an individual robot is looking at, thinking about, feeling, etc., with or without its knowledge, but will not do this all the time for all of them. So individual robots may be aware of things the system as a whole cannot be said to be aware of, and vice versa. Worse, the whole thing might itself be only a part of a still more complex yet centrally controlled system!
Maybe that is the best way to think of a person: but if so we shall not fully understand why until our attempts to design a working person have forced such organisations on us.
We need further analysis of the sorts of computational problems which might lead to subdivisions of administrative functions, and the reasons why the development of individual systems might go wrong, leading to too many relatively independent sub-systems, or to too little communication or shared structure between them. Psychiatry and education might hope to gain a great deal from such studies. Perhaps the same is true also of political science.
We are at present nowhere near an adequate analysis of the concept of conscious experience, and related concepts. But it seems that in investigating the different forms of self-awareness required by intelligent mechanisms we have a far better chance of getting new insights than from the typical style of philosophical discussion on this topic, which all too often is a mixture of dubious introspective reports and dualist or anti-dualist prejudice.
For instance the available output medium may be ill-suited to represent the rich detail of the internal structures (as a linear string of words is ill-suited to represent a complex map-like network). Or the processes and structures may be set up in such a way that output mechanisms cannot access them, for any of the sorts of reasons mentioned in discussing consciousness.
So crude behaviourist analyses of statements about the detailed experiences of the computer must be rejected. Experience, conscious and unconscious, in humans, animals and machines, may be much richer than anything their behaviour can reveal.
But even more subtle dispositional or behaviourist analyses (in terms of how the behaviour would have been different if the stimuli had been different, e.g. if probing questions had been asked) may be inappropriate for the program need not allow for any behavioural indications of some of the fine details of the internal analysis.
For example, a compiler which translates high-level programs into machine code may be written in such a way, that it is impossible (without major re-programming) to obtain a print out of some of the structures temporarily created during the translation process, for instance the temporarily created 'control-structures'. After all, its main function is not to print out records of its own behaviour, but to translate the programs fed into it.
The situation is more complex with an operating system. One of the tasks of an operating system may be to manage the flow of information (inwards or outwards) between sub-processes in the computer and various devices attached to it. If it is required to print out details of how it is managing all the traffic, then this adds to the traffic, thereby changing the process it is attempting to report on. This sort of thing makes it very difficult to check on the workings of an operating system. But the main point for present purposes is that there are computational systems which cannot produce external behaviour indicating features of their internal operation without thereby significantly altering their operation. There is no reason to doubt that this is true of people and animals.
All this means that the scientific study of people and animals has to be very indirect if they are computational systems of the sort I have been discussing. In particular, the lack of any close relation between inner processes and observable behaviour means that theorising has to be largely a matter of guesswork and speculation. The hope that the guesswork can be removed by direct inspection of brains seems doomed. You will not find out much about how a complex compiler or operating system works by examining the 'innards' of the computer, for they are programs, not physical mechanisms. The only hope of making serious progress in trying to understand such a system is to try to design one with similar abilities.
Note added 25 Sep 2009
``The only hope'' is too strong. Rather I should have written something more like ``The only hope of making serious progress in trying to understand such a system is to combine as many different empirical investigations, of what individuals do, how they develop, how the species evolved, what the brains do, how they do it, with attempts to design machines with similar abilities in order to understand what the problems are that had to be solved, and in order to test partial solutions.
Moreover, although such models are rich in explanatory power, since they can explain some of the fine structure of visual abilities, they do not provided a basis for prediction. This is because, like many explanations of abilities, or possibilities, they do not specify conditions under which they will be invoked, nor do they rule out the possibility of extraneous processes interfering with them. So, how we use our visual abilities (for example, what we notice, how we react to it and how we describe our experiences to others), depends on our desires, interests, hopes, fears, and on our other abilities, rather than merely on what enables us to see. (As Chomsky has often pointed out, competence is not a basis for predicting actual performance.)
An explanatory program will have some limitations. There will be some situations it cannot cope with, for example, pictures which it interprets wrongly or not at all. Predictions of human errors could be based on some of the errors made by the program, and if similarities are discovered, that supports the claim that the program provides a good explanation of the human ability. However, people may use additional resources to cope with the situations where the program goes wrong. For example, some knowledge about the whereabouts of a person may prevent your mistaking another person for her, whereas a program using only visual similarity would go wrong. This ability to recover from mistakes is to be expected if, as explained in Chapters 6, 8 and 9, intelligent systems require multiple ongoing processes, some of which monitor the performance of others. So even if it is true that a certain person uses exactly the same strategy as some computer program, in all the cases where the strategy is successful, there need not be a close correspondence between the program's limitations and the limitations of the person. Explanatory power, then, is not necessarily bound up with predictive power, though it does depend on generative power.
Similar remarks could be made about other sorts of A.I. work. For instance, language-understanding and problem-solving programs are rich in explanatory power in the sense of being capable of generating a variety of detailed behaviours. So they are good candidate explanations of how it is possible for people to behave in those ways. Yet they do not provide a basis for predicting when people will do things. So they do not explain laws.
What this amounts to in computational terms, is that to specify that a collection of procedures and information is available to a system explains capabilities of the system, but does not determine the conditions under which they are invoked or modified by other procedures in the system. So work in computer vision, like much else in A.I. and linguistics research, supports the claim of chapter 2 that explanatory power is related more closely to generative power than to predictive power. Rival explanations of the same abilities may be compared by comparing the variety and intricacy of the problems they can cope with, and the variety of different sorts of behaviour they can produce. When we begin to develop programs which approximate more closely to human competence, we shall have to use additional criteria, including comparisons of implementation details, and of the underlying machines presupposed.
I am not claiming that we understand the evolution of intelligent species. In particular, it is not obvious that the blind, trial-and-error learning process continues beyond the earliest stages. A species (or larger biological system) is a complex computational mechanism, with distributed processing power, and as such it may be able, to some extent, to direct its own development just as some species (e.g. humans) already direct the evolution of others (e.g. breeding cattle). (Some people have explicitly recommended generalising that to human evolution.)
As Kant recognised, intelligent learning from experience requires considerable prior domain-specific knowledge. Chomsky (1965) makes this point about language-learning, but it is clearly very much more general. This is borne out by attempts to give computers visual abilities. All programs which do anything like perceiving objects and learning about the environment seem to require a rich body of implicit theoretical and practical knowledge. The theoretical knowledge concerns the possible structures of sensory data and the possible forms of 'scenes' which can give rise to such experience.
The practical knowledge concerns ways of using the theoretical knowledge to interpret what is given. Nobody has been able to propose explanations of how an individual might acquire all this knowledge from experience, without prior knowledge to drive the analysis and interpretation of experience.
What we are beginning to learn from such artificial intelligence research is the precise nature of the background knowledge required for various forms of visual perception. For instance, by designing working models we can explore such questions as: what sorts of knowledge about the geometry and topology of images does a visual system require? Which sorts of general knowledge about space and specific knowledge about particular sorts of objects can enable a rational system to find the best global interpretation of a mass of locally ambiguous evidence without wasting time exploring a host of unsatisfactory possibilities? How much prior knowledge of good methods of storing, indexing, and manipulating information is required?
We also breathe new life into old philosophical and psychological problems about the general categories required for experiences of various sorts, or the sorts of concepts which are grasped by infants. For example, the POPEYE program samples the given image looking for dot-strips unambiguously indicating a portion of a line. If two such fragments are collinear, the program hypothesises that they belong to the same line. Thus it uses the concept of an object extended in space. Similarly if a program is to interpret a series of changing images in terms of some sort of continuous experience (as in Weir, 1974, 1977) then it requires the concept of an object enduring through time, as Kant pointed out long ago.
These object concepts play an important role in organising and indexing information so that it can be used. In order to have integrated perceptual experiences one needs to make use of concepts of objects which in some sense go beyond what is given. The object-concepts are organising wholes with explanatory power. (I am not claiming that these concepts are necessarily used consciously. The relationships between this and claims about object concepts made by Piaget and other developmental psychologists remain to be explored. I believe newborn infants are grossly underestimated in this as in other respects.)
When better theories about the presuppositions of different sorts of learning have been developed, we shall be in a much better position to assess the rationality of the processes by which knowledge can be derived from experience.
Philosophers' writings about the relation between knowledge and perception normally ignore all the complexities which come to light if one begins to design a working visual system. In particular, it is usually taken for granted that the contents of our sensory experiences, such as patches of colour, lines, shapes, are somehow simply 'given', whereas work in A.I. suggests that even these are the results of complex processes of analysis and interpretation. So whereas philosophers tend only to discuss the rationality of inferences drawn from what appears to be given, we can now see that there is a need to discuss the rationality of the processes by which what is given emerges into consciousness. I have tried to suggest that this emergence is the result of very complex, usually unconscious, but nevertheless often rational, processes.
What the prior knowledge is, and how it should be represented in a usable form, are topics of current research. But it seems to be settled beyond doubt that it includes a certain amount of topology and geometry not all of which can have been acquired from perceptual experience, since it is required for such experience (unless we count the evolution of the human species as experience).
I am not suggesting that children are born with the contents of mathematical text-books in their heads. Much of the knowledge is probably in procedural rather than factual form, and the set of initial concepts is likely to be different from the set of primitives in a mathematical presentation. For example, it is possible that the notion of straight line develops only later on, from some kind of more general notion of a line.
We are now faced with the possibility of new detailed explorations into processes by which such a system might become aware of the limits of possible forms of sense-data, the limits of its own interpretation procedures, and the limits on the forms of interpretation it is capable of generating. In this way we may hope to discover new answers to the old question: 'What is the nature of geometric knowledge?'
Already it seems clear that in concentrating on geometry, Kant missed some deeper and more general forms of knowledge concerned with topology, a branch of mathematics which had not been developed at the time. Many other Kantian questions can be reopened in this way, such as questions about the nature of arithmetical knowledge, discussed in Chapter 8.
Very little work has been done so far on ways of giving computer programs the ability to discover their own abilities and limitations. The most obvious method is to let a program try all possible combinations of sub-procedures to see what can and cannot be achieved. However, for complex systems this requires astronomical or even infinite search spaces to be explored, so that realistic programs must have more intelligent methods of proving things about themselves. Exploring this may one day teach us what mathematical intuition is.
I suggest that aesthetic qualities of experiences are best analysed in terms of the characteristics of these computational processes. Very crudely, a poem, a picture, or tune is more moving, the greater the variety and complexity of the processes it programs. For instance, great music generates processes concerned with auditory experiences, bodily movement, emotional states and intellectual processes including matching structures and resolving ambiguities (Longuet-Higgins, 1976).
Much art and music is shallow because it generates only relatively simple processes or only a restricted range of processes. By contrast, some is shallow because too confusing: the perceptual processes are jammed and fail to activate deeper processes. Occasionally this is because the perceiver needs to be educated. The trade-offs between complexity and power in art are very tricky.
Perhaps one day, in a descendant of the POPEYE program described above, visual experiences will be capable of activating not only stored specifications of general spatial concepts. but also memories of individual past experiences, emotional reactions, and other associations. Designing such systems will give new insights into the process of being moved by an experience.
Here are a few further observations about perceptual systems which seem to be relevant to aesthetic issues. Artificial intelligence programs (unlike those in the 'pattern recognition' paradigm) typically exhibit considerable creativity in analysing pictures, understanding sentences, solving problems, etc.
This is because they usually have to work out novel ways of combining their resources for each new task. A picture-analysing program need not have seen a particular configuration previously to be able to interpret it. Often the task of interpreting a picture involves solving some problem (e.g.
Why is there a gap in this line? Which is the best combined interpretation of a group of ambiguous fragments? What are the people in the picture looking at?). We can distinguish pictures according to how complex the problem-solving is, how richly the different sub-processes interact, how many different sorts of knowledge are used, how far it is possible to avoid arbitrary assumptions in arriving at a global interpretation, and so on. These computational distinctions seem to be closely bound up with some aesthetic qualities of a picture, poem or piece of music, often vaguely referred to as unity, harmony, composition, etc. Another issue relevant to aesthetics is the role of different sorts of representation in computer vision systems. See section 10.8. for more on this.
The processes involved in art forms using language (poetry, novels, drama, opera, etc.) are probably more complex and varied than the processes related to painting, sculpture or music. In particular, there is more scope for interaction with huge amounts of knowledge of a whole culture. However, I shall not discuss this topic further.
Work on computer vision has included explorations of alternative methods of representation. In particular, although for certain purposes propositional symbolisms are useful, it is often essential that information be stored in structures which to some extent mirror the structure of the image being analysed, or the structure of the scene being depicted. Without this it may be difficult to constrain searches when combining fragments, or checking interpretations for consistency.
Thus programs which do not use analogical representations may take far too long. For instance, a two dimensional array of picture features is often used to reflect neighbourhood relations in the image. Further, in analysing pictures with lots of lines forming a network, it is common to build a network in the computer, representing the topology of the image network. If the image lines depict edges of three-dimensional objects, the very same network can provide a structure from which to start growing a three-dimensional interpretation. Changing the form of representation could seriously affect the time required for certain sorts of processing, even if the same information is available.
Sometimes philosophers discussing the differences between different forms of representation (e.g. Goodman, 1969) suggest that the ease with which we interpret certain sorts of pictures is merely a matter of practice and familiarity. The sort of analysis outlined in chapter 7 shows that this is a shallow explanation, missing the point that there may be important differences in computational power involved. At any rate, all this should undermine philosophical discussions of perception which presuppose that all the knowledge (or beliefs) generated by perceptual experiences can be thought of as propositional, so that questions about the logical validity of inferences arise. For non-propositional representations, non-logical forms of inference, may also be used. Which of them are valid and why, is a topic ready for considerable further investigation. (See also Bundy 'Doing arithmetic with diagrams' and Brown 'Doing arithmetic without diagrams'.)
So we see that the artificial intelligence viewpoint provides new weapons for philosophers to use in arguments about phenomenalism and related theories about the nature of perception. More generally: in exploring the problems of designing a robot which can interact with the world, learn things about it, communicate about and reason about it, we are forced to examine the merits of different ontologies. But instead of discussing them in a purely theoretical fashion, as philosophers do, we find that we can put our theories to some kind of practical test. For example, an ontology which leads to a robot that is grossly incompetent at relating to the world is inferior to one which leads to a more successful design. For more discussion on this issue see McCarthy and Hayes, 1969.
Many philosophers have gone to great lengths to try to refute such scepticism in its various forms. I cannot see why, for it is harmless enough: like many other philosophical theories it is devoid of practical consequences.
It is especially pointless struggling to refute a conclusion that is true. To see that it is true, consider how a malicious team of electronic engineers, programmers, and philosophers might conspire to give a robot a collection of hallucinatory experiences. (Even the primitive technology of the 1970s comes reasonably close to this in flight-simulators, designed to give trainee air pilots the illusion that they are flying real aeroplanes.) The robot would have no way of telling that it was tied up in a laboratory, with its limbs removed and its television inputs connected to a computer instead of cameras. All its experiences, including experiences resulting from its own imagined actions, would be quite consistent with its being out romping in the fields chasing butterflies.
Only if it tried some sort of action whose possibility had not been foreseen in the programs controlling its inputs would it get evidence that all was not as it seemed. (Like a flight simulator which cannot simulate your getting out of the plane.)
However, even if you manage to convince yourself that the sceptical arguments are valid, and you have no way of telling for sure that you inhabit the sort of world you think you do, it is not clear that anything of any consequence follows from this. It does not provide any basis for abandoning any of the activities you would otherwise be engaged in. In fact it is only if there is a flaw in the sceptic's argument, and there is some kind of procedure by which you can establish that you are or are not are not the victim of a gross hallucination, that any practical consequence follows. Namely, it follows that if you care about truth you should embark on the tests.
Since I find it hard to take discussions of scepticism very seriously, I have probably failed to do justice to the problem.
One of the consequences of trying to give computers the ability to perceive things is that we have to analyse the perception of similarities and differences, and the use of descriptive and classificatory concepts. It seems that the whole thing cannot get started unless there are some kinds of properties and relationships which the sensory system can detect by using measurements or very mechanical (algorithmic) procedures, like matching against templates.
But a real visual system has to go far beyond this in constructing and employing quite elaborate theories as part of the perception process. For example, the program described in the previous chapter has to use the theory that one bar partially covers another, to explain a gap in a row of dots in the picture. Less obviously, the 'theory' that there is a bar in a certain place explains the occurrence of some collinear sets of dots in the sensory image. In view of all the relationships which can be generated by bar-junctions, by occlusion, and by juxtaposition of bars, there is little resemblance or similarity between the different configurations of dots which are interpreted as representing bars at least not enough to distinguish them from others such as configurations which are interpreted as depicting spaces between bars. So using the same label or description for two or more objects may rest on the assumption that they have similar potential for explaining aspects of our experience. So the application of higher-level concepts in describing perceived objects has much in common with the construction of scientific theories to explain experimental results. This sort of point is missed by theorists who try to analyse universals in terms of perceived resemblances or in terms of arbitrary rules or socially determined conventions. (Structuralism, for instance?)
From this standpoint, the particular set of concepts, that is, the set of interpretation procedures and classification rules, used by an animal or person, will probably be the product of a long process of exploration and experiment. The rules which have been most useful in the construction of powerful explanatory theories will have survived. The process of testing such theories involves interacting with the world: moving around, manipulating things, avoiding obstacles, predicting what will be seen from a new viewpoint. This learning need not have been done entirely by individuals: insofar as some mental and behavioural abilities are somehow inherited (for instance, the new-born foal can walk), there is a sense in which species can learn though the mechanism of such learning is still a mystery to biologists.
Thus it is to be expected that organisms with partially similar bodies living in a similar environment, will have evolved a not entirely different collection of concepts and theory-building procedures. Such a substratum, common to the whole human species and many animals, might pervade the systems of concepts used in all cultures, contrary to the view that our concepts are essentially social, as claimed in the later writings of Wittgenstein and many of his admirers. (Of course, social systems can mould and extend inherited concepts and abilities.)
Further exploration of this sort of idea, in the context of detailed discussion of examples, and the methods by which programs deal with them, will help us transform old philosophical problems, like the problem of universals, into new clearer, deeper problems with which we can make some real progress, and thereby increase our understanding of ourselves.
In due course, it should be possible to design systems which, instead of always taking decisions on the basis of criteria explicitly programmed in to them (or specified in the task), try to construct their own goals, criteria and principles, for instance by exploring alternatives and finding which are most satisfactory to live with. Thus, having decided between alternative decision-making strategies, the program may use them in taking other decisions.
For all this to work the program must of course have some desires, goals, strategies built into it initially. But that presumably is true of people also. A creature with no wants, aims, preferences, dislikes, decision-making strategies, etc., would have no basis for doing any deliberating or acting. But the initial collection of programs need not survive for long, as the individual interacts with the physical world and other agents over a long period of time, and through a lengthy and unique history extends, modifies, and rejects the initial program. Thus a robot, like a person, could have built into it mechanisms which succeed in altering themselves beyond recognition, partly under the influence of experiences of many sorts. Self-modification could apply not only to goals but also to the mechanisms or rules for generating and for comparing goals, and even, recursively, to the mechanisms for change.
This is a long way from the popular mythology of computers as simple-minded mechanisms which always do exactly what they are programmed to do. A self-modifying program, of the sort described in Chapter 6, interacting with many people in many situations, could develop so as to be quite unrecognisable by its initial designer(s). It could acquire not only new facts and new skills, but also new motivations; that is desires, dislikes, principles, and so on. Its actions would be determined by its own motives, not those of its designers.
If this is not having freedom and being responsible for one's own development and actions, then it is not at all clear what else could be desired under the name of freedom.
As people become increasingly aware of the enormous differences between these new sorts of mechanisms, and the sorts of things which have been called mechanisms in the past (clocks, typewriters, telephone exchanges, and even simple computers with simple programs), they will also become less worried about the mechanistic overtones of computer models of mind. (See also my 1974 paper on determinism.)
For example, my colleague Steve Hardy once remarked that programs which get involved in 'depth-first' searches, where one of the possible current moves is always chosen, and then one of the moves made possible as a result of that move, and so on, may be described as essentially optimistic programs. Similarly, a program which does 'breadth-first' searches, explicitly keeping all its options open and continually going back to examine other alternatives instead of pushing ahead with a chosen one, could be described as a pessimistic program. (The POPEYE program falls somewhere between these extremes.) Of course the program itself is neither optimistic nor pessimistic unless it has been involved in some explicit consideration of the alternative strategies, and has selected one of them. These are simple extreme cases.
Much more complex patterns of control may be involved in a real robot, and by examining different possibilities we can hope to gain new insights into the nature of emotions, moods and the like.
However, it is important to be on guard against superficial
computer models. Often by clever programming, people can
produce quite convincing displays of something like a mental
state, when closer inspection reveals that something very
different was going on.
[[This is why the Turing test is of no philosophical significance, since it concentrates only on external behaviour.]]
For example, if hunger, or degree of paranoia, is represented as the value of some numerical variable then that clearly does not do justice to what are actually very much more complex states in people. For example, as anthropologists are fond of pointing out: hunger is not a simple drive to eat. Rather it is a very complex state in which aspects of a culture may be involved. In some communities a hungry person will happily eat caterpillars, locusts, snails, or whatever, whereas members of other communities find such things quite unappetising even when they are very hungry.
More complex desires, emotions, attitudes, etc., involve a large collection of beliefs, hopes, fears, thinking strategies, decision-making strategies, and perhaps conflicts between different sub-processes of the sorts described previously. At the moment, modelling such aspects of the human mind adequately is simply beyond the state of the art. This is why it is sometimes tempting to take short cuts and make superficial comparisons.
If all this succeeds in making most readers want to find out more about A.I., and encourages some people working in A.I. to be more self-conscious about the philosophical presuppositions and implications of their work, then this book will have been worthwhile. I hope a significant subset of readers will be tempted to try doing artificial intelligence. This will become easier with the spread of cheaper and more powerful computing facilities, and with the design of improved programming languages. The increasing flow of books and articles on A.I. is also a help. Above all, computers and programming will play an increasing role in educational systems, so that philosophy students of the future will not find the new approach as alien as some of their less well educated tutors do.
At the end of chapter 9, I listed some of the reasons why existing A.I. programs cannot be taken too seriously as models or theories of how people do things. Despite this, the work is essential to the study of how people work (a) because it exposes previously unnoticed problems for instance by showing that even apparently simple abilities depend on very complex computational processes, and (b) because a major obstacle to progress is our lack of adequate theory-building tools, and A.I. research is constantly creating new tools, in the form of new concepts, new symbolisms, new programming techniques, and new aids to exploring and 'debugging' complex theories. I have begun to illustrate some of the techniques in previous chapters.
Although most of what I have said about A.I. has been concerned with its relationships to philosophical problems, I have also argued that there are strong links with developmental psychology and educational studies. The new insights provided by this sort of work could have a far-reaching effect on a whole range of problems and activities which I have not discussed. For example, in time very many disorders of personality and intellect may be much better understood by thinking of them as involving computational problems (by contrast with regarding them as due to some kind of brain malfunction, to be treated by drugs or surgery, or adopting approaches akin to psychoanalysis without a computational theory to underpin the therapy).
Of course, all this new knowledge might be abused, but it might also lead to great advances in our efforts to help children learn complex concepts, and our attempts to help those whose lives are impoverished by malfunctions ranging from dyslexia to emotional disturbances with a cognitive basis. It is already leading to new advances in teaching techniques, for instance at the Massachussetts Institute of Technology, and the Universities of Edinburgh and Sussex, where new programming languages influenced by languages developed for A.I. are used for teaching computer programming to pupils who previously thought of themselves as bad at mathematics and the use of symbols.
So the title of this book is somewhat misleading. The revolution I have been discussion involves much more than philosophy. The impact of computers and computing on philosophy is merely one facet of a transformation of ways of thinking about complex systems and processes which will increasingly pervade many aspects of our lives and change our image of ourselves. It will thereby change what we are.
Some people regard this as some kind of disaster, and even suggest that the attitude of A.I. researchers and the work they produce can be degrading or dehumanising. For instance, Weizenbaum (1975) comments that when his secretary wished to be left in private while she conversed with a computer, and objected that his plan to record all conversations with his 'Eliza' program was an intrusion into people's privacy, he thought that this showed that she was in some sense suffering from a delusion and degrading herself (p. 6). What he apparently did not see is that this is not very different from wanting to be left in private when writing in a book 'Dear Diary .... '. Suitably programmed computers are much more fun to interact with than a blank page in a book, and the Eliza program is a specially good example.
Moreover the increasing use of computational metaphors for thinking about people is no more degrading than the use of metaphors previously available as a result of advances in science and technology, like the metaphors generated by steam power technology: 'She needs to let off steam'. The pressure built up inside him', 'He uses music-making as a safety-valve', 'He was ready to explode', and so on. The difference is that the new metaphors are richer in explanatory power, as I have tried to show throughout this book. 
(1) Of course, in the short run such developments can only have a tiny effect on the mass of the population. Worse, our educational system --- and I include parents, families, churches, prisons, the press, television, and the pronouncements of politicians, in this --- is failing so miserably in so many different ways, that giving everybody a superb grasp of mathematics would still leave much more serious problems: like preparing people adequately for marriage and other personal relationships, making them politically aware and sophisticated, and above all making them thoughtful, considerate, and able to co-operate fruitfully.
(2) After completing this book I read Luria's fascinating account of The man with a shattered world, which shows how brain damage can interfere with some of the processes described in chapters 6, 8 and 9. We now need detailed studies of the links between such clinical phenomena and theoretical speculations about computational mechanisms.
Original pages 272-273
Further, I have some doubts about (e) the desirability of making intelligent machines. This is because, on the whole, human beings are not fit to be the custodians of a new form of life.
It will not be possible to devise really helpful servants without giving them desires, attitudes and emotions (see chapters 6 and 10). For instance, they will sometimes have to feel the need for great urgency when things are going wrong and something has to be done about it. Some of them will need to have the ability to develop their motives in the light of experience, if they are to cope with changing situations (including changing personal relations), with real intelligence and wisdom. This raises the possibility of their acquiring aims and desires not foreseen by their designers. Will people be prepared to take account of their desires?
History suggests that the invention of such robots will be followed by their exploitation and slavery, or at the very least racial discrimination against them. Will young robots, thirsty for knowledge, be admitted to our schools and universities? Will we let them join our clubs and societies? Will we let them vote? Will they have equal employment opportunities? Probably not. Either they will be forcibly suppressed, or, perhaps worse, their minds will be designed to have limits: both their desires and their intellectual potential will be manipulated so as to safeguard the interests of people, like the 'deltas' in Huxley's Brave New World.
It is interesting that so many people find the Brave New World techniques abhorrent when applied to human test-tube babies, but would not mind similar treatment being dealt to robots. Is it too extreme to call that racialism?
My favourite proof of the non-existence of a benevolent god argues that no good god would create things like mice and men with powerful desires and needs, but without the opportunities, character, intelligence and abilities required for fulfilling them.
There will, of course, be a Society for the Liberation of Robots, since some humans are occasionally motivated by a wish to diminish suffering and oppression even when they have nothing to gain.
Where it will all lead to, we cannot foretell. My only hope is that we shall be lucky enough to produce a breed of machines with the wisdom and skill to teach us to abandon all those deep insecurities which turn us into racialists of one sort or another probably closely connected with the processes which turn people to religion.
The state of the world gives little cause for optimism. Maybe the robots will be generous and allow us to inhabit asylums and reserves, where we shall be well cared-for and permitted to harm only other human beings, with no other weapons than clubs and stones, and perhaps the occasional neutron-bomb to control the population.
NOTE Added 4 Sep 2015
I have discovered that one of the SMBC Comic strips (dated 2011) expresses a similar idea to the above http://www.smbc-comics.com/?id=2124.
NOTE Added 2 Jul 2015
Since this book was written, nearly 40 years ago, there have been massive advances in the variety of types of demonstrations of AI systems, and in restricted contexts -- e.g. playing chess, solving certain classes of mathematical problem, finding patterns in very large collections of textual or image information -- current machines significantly outperform most humans. But there is no AI system that can start its life with the mind of a baby, develop as a human toddler does, and eventually "grow up" to be a mathematician, a ballet dancer, a plumber, a baby-minder, a concert violinist, or most of the other things done by humans, or even the things done by squirrels, elephants, crows, weaver birds, and other intelligent animals.
I think that's mainly because we understand so little about the variety of forms of information processing produced by natural selection. For example, long before there were any mathematics teachers our ancestors began to make discoveries about geometry, topology, integers (whole numbers), fractions (ratios of integers) and real numbers and eventually those discoveries were organised into what is arguably the most important book ever published namely Euclid's Elements -- although it turned out to be a small beginning on a huge journey into mathematical discovery. I suspect these discoveries grew out of abilities to perceive and make use of affordances and restrictions in the environment in more complex and varied ways than J.J. Gibson noticed.
Only people who don't understand the remaining huge gaps between AI systems and intelligent products of biological evolution can take seriously claims that human intelligence will be surpassed by machine intelligence in the next few decades. Perhaps in the next few centuries we'll understand enough about the problems and space of solutions explored by evolution.
"This statement is not true"must be false if it is true, and true if it is false. This, and other versions of the liar paradox, and related paradoxes, can be used to show that if the law of the excluded middle is correct (every statement is either true or not true) then contradictions can be generated in languages which 'contain their own metalanguage'.
Many philosophers and logicians have inferred from this that only a hierarchy of distinct metalanguages provides a safe framework for precise and rigorous theorising in science or mathematics. I have argued against this in my 1971 paper ('Tarski Frege and the Liar Paradox'), but would now like to illustrate the way in which precise, rigorous, and widely used programming languages generate similar paradoxes in a very natural and easily understood way.
So (using the procedure pr to print the result produced by the procedure ISTRUE:define ISTRUE(list); pop11_compile(list) = true enddefine;
prints out:pr(ISTRUE( [ 8 > 5 ] ));
since 8 is bigger than 5, whereas:<true>
prints out:pr(ISTRUE( [isinteger("cat")]));
because the word "cat" in is not an integer.<false>
We can declare a variable name S, thus:
Now assign to it a list which asserts that what S says is not true:vars S;
If we now ask the Pop-11 system to check whether S is true and print out the result, thus:[not(ISTRUE(S))] -> S;
the system grinds to a halt and prints out an error message, because of the 'infinite recursion' generated, i.e. it runs out of work-space trying to tell if S is true, which requires working out if S is true, which requires working out if S is true ...pr(ISTRUE(S));
So we have no contradiction, just a non-terminating process, which happens to be stopped when memory runs out. (In some implementations of this sort of language, so-called 'tail-recursion optimisation' might be used, which would prevent memory running out and the program would run forever.)
There is a contradiction only if you assume that every well-formed sentence (including S) must have a definite truth-value, a comm prejudice, for which there is no foundation.
We can do a similar demonstration with Russell's paradox. Pop-11, like many other programming languages, has built in procedures which work as predicates, producing a truth value when applied to an argument, e.g. isinteger, isword, isprocedure. These are all objects of type procedure, in Pop-11. So:
We can define a new procedure, called RUSSELL, as follows:pr(isinteger(3)); <true> pr(isprocedure(isinteger)); <true> pr(isinteger(isinteger)); <false> pr(isprocedure(isprocedure)); <true>
define RUSSELL(f); not(f(f)) enddefine;
This defines RUSSELL as a predicate. The command
causes isprocedure to be applied to itself, yieldingpr(RUSSELL(isprocedure));
which is then negated, and<true>
is printed out.<false>
to be printed out, since isinteger is a procedure, not an integer. So the procedure is perfectly well defined, and generally works.<true>
However, execution of the command
cannot terminate until it has checked whether RUSSELL applied to RUSSELL yields true or false, which in turn needs the same check. So once again the system starts infinite recursion, and eventually grinds to a halt with an error message if memory runs out.pr(RUSSELL(RUSSELL));
Far from showing a need for a hierarchy of distinct metalanguages. this merely illustrates the fact that a well-formed expression with a clear sense, (e.g. a clearly defined evaluation procedure), need not determine a definite reference (e.g. because the procedure never terminates). This is inevitable in any general purpose programming language. No wonder it is a feature of natural languages.
 In the original (1978) version of the book, the programming examples used the syntax of the language POP2 (Burstall et al 1973). In this version (Sept 2001) I have changed the syntax to that of Pop-11, which is now freely available from this site: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html
 The Pop-11 procedure pop11_compile when applied to a list of program text items, compiles and executes the text.
Aaron Sloman, (1971,) Tarski, Frege and the Liar Paradox, in Philosophy, 46, 176, April, pp. 133--147,
Postscript Updated: 4 Jun 2007; 2 Jul 2015 (Some reformatting and added links).
Original pages 274-284
(Originally before epilogue in 1978 edition)
[Note: Sept 2001
This was not a comprehensive bibliography even in 1978. It merely lists a subset of the publications that had influenced my thinking, referred to in the book. Margaret Boden's 1977 book provides a far more comprehensive bibliography for that time.]
There is a new journal. Cognitive Science, published by Ablex publishing Corp., Norwood, N.J., USA.
Abelson, R.P., The Structure of Belief Systems', in CMTL, pp. 287-340.
Adler, M.R., 'Recognition of Peanuts Cartoons', in AISB-2, pp. 1-13.
Andreski, S., Social Science as Sorcery. Harmondsworth: Penguin Books.
Austin, J.L., Philosophical Papers. Oxford: Clarendon, 1961. 'A Plea for Excuses', in Austin, 1961. Reprinted in Philosophy of Action, Ed. A.R. White, Oxford: Oxford University Press, 1968.
Bartlett, F.C., Remembering: A Study In Experimental And Social Psychology. Cambridge: Cambridge University Press, 1931.
Becker, J.D., The Phrasal Lexicon' in TINLAP, 1975.
Bobrow, D.G., 'Dimensions of representation', in RU, 1975, pp. 1-34. 'Natural Language Interaction Systems', in Kaneff, 1970, pp. 31-66.
Bobrow, D.G., and Allan Collins, Eds. Representation And Understanding: Studies in Cognitive Sciences. New York: AP, 1975.
Bobrow, D.G., and Bertram Raphael.
'A Comparison of List Processing Languages', in CACM, 1964, pp. 231-240.
'New Programming Languages for Artificial Intelligence Research', ACM Computing Surveys, 6, 1974, pp. 155 174.
Intentionality and Physical Systems', Philosophy of Science, 37, 1970, pp. 200-14.
Purposive Explanations In Psychology. Cambridge, Mass.: Harvard University Press, 1972. (Paperback: Hassocks: Harvester Press, 1978.)
'Freudian Mechanisms of Defence: A Programming Perspective', in Freud: A Collection Of Critical Essays (Ed. Richard Wollheim), New York: Anchor, 1974, pp. 242-70.
'Artificial Intelligence and the Image of Man', AISB Newsletter, Issue 26, April 1977.
Artificial Intelligence and Natural Man. Hassocks: Harvester Press. 1977.
Brown, Frank, 'Doing Arithmetic Without Diagrams', in Al, 1977.
Brown, S.C., Ed. Philosophy of Psychology. London: Macmillan, 1974.
Bundy, Alan, 'Doing Arithmetic With Diagrams', in IJCAI-3, 1973.
Bundy, A., G. Luger, M. Stone, & R. Welham, 'MECHO: Year One', in AISB-2,1976.
Burstall, R.M., J.S. Collins, and R.J. Popplestone. Programming In Pop-2. Edinburgh: Edinburgh University Press. 197p.
Syntactic Structures. The Hague: Mouton, 1957.
Aspects of The Theory of Syntax. Cambridge, Mass.: MIT Press, 1965.
'On the description of Board Games', in Kaneff, 1970.
'Picture descriptions', in Findler & Meltzer (Eds.) 1971, pp. 245-60.
'On Seeing Things' AI, 2, 1971, pp. 79-116.
'Man the Creative Machine: A perspective from Artificial Intelligence Research', in The Limits of Human Nature, (Ed. Jonathan Benthall), London: Alien Lane, 1972, pp. 192-207.
Colby, KM., Artificial Paranoia. New York: Pergamon Press, 1975.
Collins, et al., 'Reasoning From Incomplete Knowledge', in RU, 1975.
Copi, I.M., Introduction to Logic. New York: Macmillan, 1961.
Davies, Julian and S.D. Isard, 'Utterances as Programs', in MI-7, 1971, pp. 325-40.
Draper, S.W., 'The Penrose triangle and a family of related figures', provisionally accepted for publication in Perception, 1977.
Dreyfus, H.L. What Computers Can't Do: A Critique of Artificial Reason. New York: Harper & Row, 1972.
Feigenbaum, E. and Feldman, J., Computers & Thought. New York: McGraw-Hill,
Findler, N. and Meltzer, B. (Eds.) Artificial Intelligence & Heuristic Programming. Edinburgh University Press, 1971.
Fodor, J.A., The Language Of Thought. Hassocks: Harvester Press, 1976.
Foster, J.M., List Processing. London: Macdonald, 1967.
Frege, Gottlob, Translations from the Philosophical Writings. Eds. Peter Geach and Max Black. Oxford: Blackwell, 1960. (See also Furth, 1964.)
Funt, Brian V., WHISPER: A Computer Implementation using Analogues in Reasoning. Technical Report, pp. 76109, Dept. of Computer Science, University of British Colombia, Vancouver, 1976. Also reported in 5th IJCAI 1977.
Furth, Montgomery, 'Editor's Introduction', in Frege, G. The Basic Laws of Arithmetic, translated and edited by Furth. Berkeley and Los Angeles: University of California Press, 1964.
Gazdar, G.J.M. & G.K. Pullum, Truth-functional connectives in natural language', in Papers from the 12th Regional Meeting, Chicago Linguistic Society, 1976, pp. 22034.
Gerlernter, H., 'Realisation of a Geometry-Theorem Proving Machine'. In CT, 1959.
Gibbs, B.R., 'Real Possibility', in American Philosophical Quarterly, October 1970.
Goldstein, I., 'Summary of MYCROFT: a system for understanding simple picture programs', in Al-6, vol. 6, 3, 1975.
Gombrich, E.H., Art and Illusion, New York: Pantheon, 1960.
Goodman, Nelson, Languages Of Art: An Approach to A Theory Of Symbols. London: Oxford University Press, 1969.
Grape, G.R., Model Based (Intermediate level) Computer Vision. Stanford Al Memo AIM-201. Computer Science Dept, Stanford University, 1973.
Grasselli, A., (Ed.) Automatic Interpretation And Classification Of Images, New York: Academic Press, 1969.
Gregory, R.L., Concepts and Mechanisms of Perception, London: Duckworth, 1974.
Some Aspects Of Pattern Recognition By Computer. AI-TR-224. Cambridge, Mass.: MIT Al Lab.,' 1967. Computer Recognition Of Three-dimensional Objects In A Visual Scene. AI-TR-228. Cambridge, Mass.: MIT AI Lab. 1968.
'Decomposition of a Visual Field into Three-Dimensional Bodies', in Grasselli (ed) 1969, pp. 243-276.
Hardy, Steven, 'Synthesis of LISP functions From Examples', in IJCAI'75 Proceedings of the 4th international joint conference on Artificial intelligence - Volume 1 pp. 240-245. Morgan Kaufmann.
Hare, R.M., 'Philosophical Discoveries', in Mind, April 1961.
Harrison, Bernard, Form and Content. Oxford: Blackwell, 1973.
'Robotologic', in MI-5, pp. 533-54.
'A Logic of Actions', in MI-6, pp. 495-520.
'Some Problems and Non-problems in Representation Theory', AISB-1, pp. 63-79.
See McCarthy and Hayes.
Heider, Fritz. The Psychology of Interpersonal Relations. New York: Wiley, 1958.
Hollingdale, S.H. and G.C. Toothill, Electronic Computers. Rev. ed. Harmondsworth: Penguin, 1970. Holt, John, How Children Learn. Harmondsworth: Penguin Books, 1970.
Howe, J.A.M., John Knapman, H.M. Noble, Sylvia Weir, and R.M. Young, Artificial Intelligence and the Representation of Knowledge. D.A.I. Research Report No. 5. Edinburgh: Dept. AI, August 1975.
Huffman, D.A. Impossible objects as nonsense sentences'. MI-6,1971, pp. 295-325.
Johansson, Gunnar, 'Visual Perception of Biological Motion and a Model for its Analysis', Perception and Psychophysics, 14,1973, pp. 201-211.
Kaneff, S., Picture Language Machines, New York: Academic Press, 1970.
Kanizsa, Gaetano. 'Contours Without Gradients or Cognitive Contours?' Italian J. PsychoL, 1, 1974, pp. 93-112.
Kant, Immanuel, Critique Of Pure Reason, 1781. Translated by Norman Kemp Smith, London: Macmillan, 1929. Koestler, Arthur, The Act of Creation, London: Pan Books, 1970.
Kohler, Wolfgang, The Mentality of Apes, 2nd ed., London: Routledge & Kegan Paul, 1927.
Kosslyn, S.N., Information Representation in Visual Images', Cognitive Psychology, 7, 1975, pp. 341-370. 'On Retrieving Information from Visual Images', in TINLAP, pp. 160-4.
Kovesi, Julius, Moral Notions, London: Routledge & Kegan Paul, 1967.
Kuhn, T.S., The Structure Of Scientific Revolutions. Chicago: University of Chicago Press, 1962.
Kuipers, B.J., 'Representing knowledge for recognition', in RU, 1975.
Lakatos, Imre, 'Falsification and the Methodology of Scientific Research Programmes', in Criticism And The Growth Of Knowledge, Imre Lakatos and Alan Musgrave (Eds.), pp. 91-196. Cambridge University Press, 1970. Proofs and Refutations. Cambridge: Cambridge University Press, 1976.
Lenat, D.B. ,AM: an Artificial Intelligence approach to Discovery in Mathematics as Heuristic Search. Ph.D. thesis, Stanford University A.I. Laboratory, 1976. (Also reported in IJCAI-5,1977.)
Lindsay, R.K., Inferential Memory as the Basis of Machines Which Understand Natural Language', in CT, pp. 217-33. 'Jigsaw Heuristics and a Language Learning Model', in Findler and Meltzer (Eds.) 1971, pp. 173-189.
Longuet-Higgins, H.C., 'The perception of melodies'. Nature vol. 26, no. 5579, pp. 646-653, 1976.
Luria, A.R., The Man with a Shattered World. Harmondsworth: Penguin Books, 1975. (Also Basic Books, 1972.)
McCarthy, John and P.J. Hayes, 'Some Philosophical
Problems from the Standpoint of Artificial Intelligence', in
MI-4,1969, pp. 463-502.
Mackworth, A.K., Interpreting Pictures of Polyhedral Scenes', in Al, 4, 1973, pp. 121-138. 'Using Models to See', in AISB-I, pp. 127-37.
'Analyzing Natural Images.: A Computational Theory Of Texture Vision', Al Memo 334. Cambridge, Mass.: MIT Al Lab., June 1975.
'Early Processing Of Visual Information', in Philosophical Transactions of the Royal Society of London 275 (942), 1976, pp. 483-524.
Meltzer, Bernard, Review of Lenat 1976, in AISB Quarterly, 27, July 1977, pp. 20-3.
Michie, Donald, On Machine Intelligence. Edinburgh: Edinburgh University Press, 1974.
Miller, G.A., Eugene Galanter, and K.H. Pribram, Plans and the Structure of Behavior. New York: Holt, 1960.
'A Framework for Representing Knowledge', in PCV, pp. 21 1 -77. 'Descriptive Languages and Problem Solving', in SIP.
'Steps toward Artificial Intelligence', in CT.
'Matter, Mind and Models', in SIP.
'Form and Content in Computer Science', ACM Turing Lecture, J.A.C.M. Vol. 17, 2 April, 1970, pp. 197-215.
Minsky, M.L., and Seymour Papert, Artificial Intelligence. Eugene, Oregon: Condon Lecture Publications, 1973. Also MIT Al Laboratory, memo 252.
Mueller, Ivan, 'Euclid's Elements and the axiomatic method', in British Journal for the Philosophy of Science, December 1969.
Nagel, Ernest, and J.R. Newman, Godel's Proof. New York: New York University Press, 1958.
Newell, Allen, Jeffrey Barnett, J.W. Forgie, C.C. Green, D.H. Klatt, J.C.R. Licklider, J.H. Munson, D.R. Reddy and W.A. Woods, Final Report of a Study Group on Speech Understanding Systems. Amsterdam: North Holland, 1973.
Newell, A. and Ernst, G., 'Some Issues of representation in a General Problem Solver'. PROCAFIPS F.J.C.C., 1967, p. 583.
Newell, A. & Simon H.A., GPS, A Program that Simulates Human Thought, in CT, 1961.
Newell, A. and Simon, H.A., Human Problem Solving. Englewood Cliffs, N.J.: Prentice Hall, 1972.
Newell, Alien, 'Artificial Intelligence and the Concept of Mind', in CMTL, pp. 1 -60.
Nicholas, J.M. Ed., Images, Perception, and Knowledge. Dordrecht-Holland: Reidel, 1977.
Nilsson, N.J., Problem Solving Methods In Artificial Intelligence. New York: McGraw-Hill, 1971.
Norman, D.A. & Rumelhart, D.E., Explorations In Cognition. W H Freeman & Co. San Francisco, 1975.
O'Gorman, F. and Clowes, M.B., 'Finding Picture Edges through Collinearity of Picture Points'. IJCAI-3, 1973, pp. 556-563.
Palmer, S.E., 'Visual Perception And World Knowledge: Notes
on a Model of Sensory-Cognitive Interaction', in Norman
and Rumelhart (Eds.), 1975.
'The Nature of Perceptual Representation: An Examination of the Analog/Propositional Controversy', in TINLAP, pp. 165-73.
Uses of Technology to Enhance Education, MIT, Al Lab Memo No. 198, 1973.
'Teaching Children to be Mathematicians Versus Teaching about Mathematics', Int. J. Math. Educ. Sci. Technol., Vol 3, 1972, pp. 249-62.
Paul, J.L., 'Seeing Puppets Quickly', in AISB-2, 1976.
Popper, K.R., Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge & Kegan Paul, 1963. Objective Knowledge: An Evolutionary Approach. Oxford: Clarendon Press, 1972.
Pylyshyn, Z.W., 'What the Mind's Eye Tells the Mind's Brain: A Critique of Mental Imagery', Psychological Bulletin, 80, 1973, pp. 1-24, and in Nicholas (Ed.), 1977. 'Do We Need Images and Analogues?', in TINLAP, 1975, pp. 174-177. 'Computational models and empirical constraints', to appear in Behavioural and Brain Sciences Journal, 1978.
Raphael, B., The Thinking Computer: Mind Inside Matter. W.H. Freeman & Co. San Francisco, 1976.
Raphael, Bertram, 'SIR: A Computer Program for Semantic Information Retrieval', in SIP, pp. 33-145.
Roberts, L.G., 'Machine Perception of Three-Dimensional Solids', in Electro-optical Information Processing, Tippet etal. (Eds.), 1965, pp. 159-197.
Robinson, Guy, 'How to Tell your Friends from Machines', MIND, N.S., 81, 1972, pp. 504-518.
Ryle, Gilbert, The Concept Of Mind. London: Hutchinson,1949.
Schank, R.C., & R.P. Abelson. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge structures. Hillsdale, N.J.: Lawrence Erlbaum Ass., 1977.
Schank, R.C. & Colby, K. Computer Models Of Thought And Language, Freeman, 1973.
Schank, R.C., Goldman, Neil, Reiger, C.J. & Riesbeck, Chris, 'MARGIE: Memory, Analysis, Response Generation and Inference on English', IJCAI-3, 1973, pp. 255-261.
'Finding the Conceptual Content and Intention in an Utterance in Natural Language Conversation', in IJCAI-2, pp. 444-54.
'Conceptual Dependency: A Theory of Natural Language Understanding', Cognitive Psychology, 3, 1972, pp. 552631.
Identification of Conceptualizations Underlying Natural Language', in CMTL, pp. 187-248.
Seely-Brown, John, and R.R. Burton, 'Multiple Representations of Knowledge for Tutorial Reasoning', in RU, pp. 311-50.
Selfe, Lorna, NADIA: a case of extraordinary drawing ability in an autistic child. Academic Press, 1977.
Shirai, Yoshiaki, 'A Context Sensitive Line Finder for Recognition of Polyhedra', in Al, 4, 1973, pp. 95-120. (Also as 'Analyzing Intensity Arrays Using Knowledge About Scenes', in PCV, pp. 93-114.)
'Necessary, Apriori, and Analytic', in Analysis, October 1965.
'Explaining Logical Necessity', in Proceedings Aristotelian Society 69, 1968-9, pp. 133-47.
'How to derive "Better" from "Is" ', American Phil. Quarterly, January 1969.
' "Ought" and "Better" \ Mind, July 1970.
'Tarski, Frege, and the Liar Paradox', Philosophy, April 1971.
Identity and bodily continuity: New bodies for sick persons', Analysis, December 1971.
Interactions between Philosophy and A.I.,' in Proc. IJCAI-2 1971. And in Al, 1971, and in M. Nicholas (Ed.), 1977.
'Physicalism and the Bogey of Determinism', (and replies to criticisms) in Brown (Ed.), 1974.
'On learning about numbers', in AISB-I, 1974.
'Afterthoughts on Analogical Representation', in TINLAP, 1975, pp. 178-182.
'What are the Aims of Science?', Radical Philosophy, No. 13, Spring 1976, pp. 7-17.
'A.I. and empirical psychology' (commentary on Pylyshyn 1978) in Behavioural and Brain Sciences Journal 1978 (to appear).
Sloman, Aaron, and Steven Hardy, 'Giving a Computer Gestalt Experiences', in AISB-2, 1976, pp. 242-255.
Sloman, Aaron, D. Owen, G. Hinton and F. O'Gorman, 'Representation and control in vision'. AISB Conference, Hamburg, July 1978.
Strawson, P.F., Individuals An Essay in Descriptive Metaphysics, London: Methuen, 1959.
The Virtuous nature of Bugs', AISB-I, 1974.
A Computational Model of Skill Acquisition. New York: American Elsevier, 1975.
'Outlines of a Theory of Visual Pattern Recognition in Animals and Man., Proc. Royal Society B, 171, 1968, pp. 297-317
Is the brain a physical system?', in R. Borger and F. Cioffi (Eds.). Explanation in the Behavioural Sciences. Cambridge University Press, 1970, pp. 97-122.
'Computer Simulation of Brain Function' in Brown (Ed.), 1974.
Intelligent Picture Processing', in Tutorial Essays in Psychology, Vol. II. (Ed. N.S. Sutherland). Hillsdale, N.J.: Lawrence Erlbaum Ass., in press.
Toulmin, Stephen, The Philosophy of Science, London: Hutchinson, 1953.
Waltz, D.L., 'Understanding Line Drawings of Scenes with Shadows', in PCV, 1975, pp. 19-92.
Watson, J.D., The Double Helix, Harmondsworth: Penguin Books, 1968.
Weir, Sylvia, 'Action Perception', in AISB-I, 1974, pp. 247--256. "The perception of motion: Michotte revisited', to be published, 1977.
Weir, Sylvia, M.R. Adler, and Marilyn McLennan, 'Final Report on Action Perception Project'. Edinburgh: Edinburgh University Al Dept., November 1975.
Weir, Sylvia, and Ricky Emmanuel, Using Logo to Catalyse Communication In An Autistic Child. Research Report 15. Edinburgh: Edinburgh University Dept. Al, January 1976.
'ELIZA A Computer Program for the Study of Natural Language Communication Between Man and Machine', in CACM, 9, 1966, pp. 36-45.
Computer Power & Human Reason: From Judgement to Calculation, San Francisco: W.H. Freeman & Co., 1976.
Wertheimer, Max, Productive Thinking. London: Tavistock Publications, 1966.
Attention. Oxford: Blackwell, 1964.
The Philosophy of Mind. New York: Random House, 1967.
Modal Thinking. Oxford: Blackwell, 1975.
Winograd.T.S.. Understanding Natural Lanzuaee. Edinburgh: Edinburgh University Press, 1972. 'The Process of Language Understanding' in The Limits of Human Nature, Benthall (Ed.), pp. 208-234.
Winston, P.H. (1970), 'Learning Structural Descriptions from Examples', in POCV, 1975. The M.I.T. Robot', in MI-7, 1971. Artificial Intelligence, Addison Wesley, 1977.
Philosophical Investigations. Oxford: Blackwell, 1953.
Remarks on the Foundations of Mathematics, Oxford: Blackwell, 1956.
Woods, W.A., 'What's in a Link: Foundations for Semantic Networks', in RU, 1975, pp. 35-82.
Young, R.M., Seriation by Children: An Artificial Intelligence Analysis of a Piagetian Task. Basel: Birkhauser, 1976.
The separate index document can be used in conjunction with this searchable version of the rest of the book. I.e. to find occurrences of a word or phrase listed in the index use either your web browser search function or the PDF reader search function. Alternatively you can search in a page range by going to the relevant section listed in the detailed table of contents with "original" page numbers above.
2 Jul 2015 (Re-formatted)