.nf [For Evolving Knowledge Conference. Reading University Sept 1989] IN: Evolving Knowledge in Natural Science and Artificial Intelligence, eds J.E.Tiles, G.T.McKee, G.C.Dean, London: Pitman, 1990 Comments to A.Sloman@cs.bham.ac.uk There is a neatly printable version of this paper here http://www.cs.bham.ac.uk/research/cogaff/sloman.scruffy.ai.ps http://www.cs.bham.ac.uk/research/cogaff/sloman.scruffy.ai.pdf .fi .fo'Scruffy''Page %' .ig CONTENTS -- Introduction: Neats vs Scruffies -- The scope of AI -- Bow to the inevitable: why scruffiness is unavoidable -- Non-explosive domains -- The physical (biological, social) world is even harder to deal with -- Limits of consistency in intelligent systems -- Scruffy semantics -- So various kinds of scruffiness are inevitable -- What should AI do about this? -- Conclusion .. .mh Must Intelligent Systems Be Scruffy? Aaron Sloman School Of Cognitive and Computing Sciences University of Sussex Brighton [Now School of Computer Science, the University of Birmingham] .hm .sh Introduction: Neats vs Scruffies -------------------------------- .hs You would probably be very unhappy about air travel if you were told that landing was always under the control of programs that were so complex and messy that nobody really understood how they worked, and nobody had ever proved that they would do something sensible in all the situations that could possibly arise. Would you be equally unhappy to learn that your life was in the hands (excuse the metaphor) of a human brain? There has been a long-standing opposition within AI between "neats" and "scruffies" (I think the terms were first invented in the late 70s by Roger Schank and/or Bob Abelson at Yale University). The neats regard it as a disgrace that many AI programs are complex, ill-structured, and so hard to understand that it is not possible to explain or predict their behaviour, let alone prove that they do what they are intended to do. John McCarthy in a televised debate in 1972 once complained about the "Look Ma no hands!" approach. Similarly, Carl Hewitt, complained around the same time, in seminars, about the "Hairy kludge (pronounced klooge) a month" approach to software development. (His "actor" system was going to be a partial solution to this.) The scruffies regard messy complexity as inevitable in intelligent systems and point to the failure so far of all attempts to find workable clear and general mechanisms, or mathematical solutions to any important AI problems. There are nice ideas in the General Problem Solver, logical theorem provers, and suchlike but when confronted with non-toy problems they normally get bogged down in combinatorial explosions. Messy complexity, according to scruffies, lies in the nature of problem domains (e.g. our physical environment) and only by using large numbers of ad-hoc special-purpose rules or heuristics, and specially tailored representational devices can problems be solved in a reasonable time. Moreover, the scruffies tend to complain that the neats spend so much time worrying about form (e.g. defining formal syntax and semantics for their notations) that they ignore content (e.g. what information about the world an intelligent system really needs). They sometimes even liken the "theorem envy" of those who hanker after mathematical rigour to the kind of arid statistics-based research in psychology and the social sciences that uses sophisticated mathematical techniques (and much computer power) with little or no increase in understanding of any interesting phenomenon. Recently the conflict has taken on a new twist, with the rise of connectionism. Here we have a branch of AI (yes, it is part of AI, not a new rival discipline), that is heavily mathematics-based, yet, although the general principles on which a particular network learns during its training period may be well understood, the operation of the final system when applied to real tasks generally depends on a totally opaque network of connections between processing units and weights that modify their influence on one another. Only in relatively simple cases is it possible to interrogate the "hidden layers" of a neural net and find out what they are doing or why it works. So it looks increasingly as if systems based on neural nets will, after training, be even more inscrutable than AI programs: we'll just have to trust that it is safe to extrapolate from the test cases where they appear to perform well -- exactly as we do with people to whom we give responsibility. Alas! This is not unlike the situation in weather forecasting, where the general physical principles underlying the weather are well understood but the boundary conditions are impossible to measure and represent with comlete accuracy and their large scale effects are impossible to compute within acceptable margins of error: small errors in initial inputs to equations can produce unpredictably large errors in their solutions. Anyhow, quite apart from the implications of connectionist models of neural nets there are more general things to be said about the neat/scruffy conflict. I shall try to separate different strands in the conflict and will conclude that there are good reasons why intelligent systems performing in real time in a real world, or even in some purely formal problem domains, will have to be scruffy. Still, wherever possible we should use neat theories to guide our designs and explanations. In particular, scruffy systems should be based on neat designs. We can contrast the theories about the world and how to relate to it that are used by working intelligent systems with theories about the design of such systems. I'll call the former object-level-theories and the latter meta-level-theories. My point is that we should strive to make meta-level-theories as neat as possible, including the theories that explain why object-level-theories are bound to be scruffy. .sh The scope of AI --------------- .hs One source of optimism about the possibility of neat designs is an excessively narrow view of the scope of AI. For example there are some who tend to think of AI as synonymous with the field of expert systems, and it is true that for many (though not all) expert systems the problem domain is sufficiently restricted that complete analysis is possible. I prefer a definition that reflects the range of work reported at AI conferences, in AI journals, and the interests and activities of some of the leading practitioners, including founders of the subject. From this viewpoint AI is a very general investigation of the nature of intelligence and the principles and mechanisms required for understanding or replicating it, and this includes topics like vision and motor control which are important for complete intelligent systems embodied in the physical world. Like all scientific disciplines AI thus construed has three main types of goal, empirical, theoretical, and practical, encompassing: .in +4 .ti -4 (a) empirical study and modelling of existing intelligent systems (including human beings and other animals); .ti -4 (b) theoretical analysis and exploration of possible intelligent systems and possible mechanisms, architectures or representations usable by such systems, and using the results to explain the empirical phenomena observed under (a); .ti -4 (c) solving practical problems in the light of (a) and (b), namely: .ti -4 (c.1) using the understanding gained in (b) to deal with problems of existing intelligent systems (e.g. problems of human learning or emotional difficulties) .ti -4 (c.2) using the theoretical understanding to design new useful intelligent machines .ti -4 (c.3) using the theoretical understanding to design new machines that may not in themselves be intelligent but nevertheless usefully complement human intelligence. .in -4 In the course of these activities AI generates new sub-problems, and these lead to new concepts, new formalisms, and new techniques. On this general definition there is little difference between AI and Cognitive Science, though in fact many of the people who call themselves cognitive scientists are ignorant of the contents of most AI research and its relevance to their own work. Much of the theoretical work that falls under this general view of AI includes understanding design trade-offs relevant to intelligent systems. For example, there are trade-offs between efficiency and generality, between complexity of design and the ability to degrade gracefully when things go wrong. The rest of this paper is concerned with trade-offs that confront an intelligent system that has to solve problems within constraints of time and memory. .sh Bow to the inevitable: why scruffiness is unavoidable ----------------------------------------------------- .hs The main reason why scruffiness is inevitable in intelligent systems can be summarised thus: in complex domains, neatness is defeated by combinatorial explosions, unless you have an infinitely fast processor or all eternity to wait for solutions. The argument runs as follows: except in limited cases (to be discussed below) problem domains have a structure that requires solutions to be found by exploration of branching paths in a search space. The number of possible branches will be (at least) an exponential function of the depth of path to solution. E.g. if N steps are required and the number of possible branches per step is as low as 2, the number of possible paths will be 2**N. So even if only 50 steps are required, and they can be explored at the rate of a million a second they will need about 35 years for an exhaustive exploration. A mere 70 steps would require about 37 million years. That indicates how exponential functions explode: adding a mere 20 of whatever the cost is an exponential function of multiplies the time or space requirements by about a million for a branching factor of 2 or over 3000,000,000 for a branching factor of 3. These functions grow so fast that they are not going to be helped much by parallel implementations. For problem domains that have this character it is very often the case that neat, general, sound and complete algorithms work only on small problems: they do not scale up because they have to do so much detailed analysis of cases that they explode combinatorially. .sh Non-explosive domains --------------------- .hs Of course, not all search problems are like that. If you have a very large array of names alphabetically ordered and want to search for a particular name, you can (given suitable addressing mechanisms) use the "binary chop" method, i.e. look at the middle, then decide whether to go left or right, then look in the middle of the remaining region, etc. With this method the relation between problem size and solution time is the reverse of exponential, it's logarithmic. I.e. the size of the problem is an exponential function of the time it takes to solve it rather than the other way round. So even if the problem (the list of names) is made a million times longer, you will typically need only about 20 more steps to find a name. That's how telephone directories work. There are other problems that have tractable complexity. For example if you are trying to find some combination of N values of variables that solve a problem, instead of searching through all N factorial possible combinations you may be able to structure the problem as a search in a relatively homogeneous N-dimensional vector space, where there is at any point a function that tells you in which direction to move towards the solution point. Then you can home in on the solution fairly directly by moving along the line of steepest ascent to the goal, e.g. using relaxation techniques. For more complex problems that can be represented in an N dimensional vector space, for instance pattern recognition problems, it is well known that this simple hill-climbing strategy won't work, e.g. because you can end up on a local hillock or get lost rambling aimlessly on a plateau with no clear direction of steepest ascent. However, some kinds of neural nets are capable of being trained on a particular class of problems so that they transform such a space into a different space in which solutions are found very quickly by going along the direction of steepest ascent or descent, though there is still much to be learnt about the classes of problems for which this works, and under what conditions the training methods will produce complete and sound problem solvers. Unfortunately, it is far from obvious that all the problems that an intelligent agent needs to deal with can be expressed in a form that is amenable to these solution methods. For example for some problems (e.g. searching for proofs in pure number theory) the structure of the domain makes it impossible either to set bounds to the length of proofs or to avoid branching searches for proofs. This makes it impossible to map the search space into a finite dimensional vector space amenable to relaxation or neural net techniques that home in on solutions without combinatorial searching: we are again confronted by these horrendous exponential functions. This is why nobody tries to solve problems in arithmetic by simply remembering Peano's five axioms and searching for a logical proof based on those axioms. Instead we memorise all kinds of strictly redundant facts derivable from those axioms, including multiplication tables and lots of rules of thumb for solving particular problems (e.g. you can tell if an integer is divisible by five by looking at its last digit in the decimal representation, or whether it is divisible by three by seeing whether the sum of its decimal digits is divisible by three, etc.). These stored "lemmas" (previously proved results) are facts that are strictly redundant but have been found to be useful for classes of problems that occur frequently. For other problems combinatorial search may still be required. Of course, if you depend so much on stored facts you have a new problem which is to find a way of organising them so that you can tell which one, if any, is relevant to a particular task. There is no infallible way of doing that: discovering relevance can be has hard a problem as any. So, for the full potentially infinite collection of arithmetical problems, we are about as badly off with the stored intermediate results as we are without them. Fortunately, however, it is an empirical fact that for most people's lives there's only a comparatively small, bounded, variety of arithmetical problems that they ever have to solve, so it pays to learn the ad-hoc collection of facts and tricks that enable such problems to be solved (with or without our understanding why). Those unfortunates who do research in pure number theory have no such complete problem solving kit, and therefore have to spend their lifetime groping around without such effective aids to rapid solutions, though even they tend to specialise in sub-fields where they can build up expertise in the form an ad-hoc collection of useful heuristics, lemmas, and patterns, for deciding what's worth trying next. In chess, where, unlike the set of possible proofs in number theory, the move tree is finite, it is nevertheless so large that as regards strategies for deciding in a reasonable time where to move, the tree might as well be infinite. Of course there are some important relatively small subtrees for which winning algorithms have been discovered, and good chess players learn patterns on the board to recognize when they have entered such subtrees, or when they are close to entering them. So unless someone invents an infinitely fast, infinitely large, processing system, any intelligent agent will have to find short cuts to solving problems, even when dealing with formally specified domains. In fact, short cuts are the key to intelligence: a major component of intelligence is .ul productive laziness. Here's an example of a short cut using a special purpose representation. Suppose you are asked to find all possible ways of selecting values between 1 and 9 inclusive for the variables a to i, satisfying the following conditions. 1. No two variables have the same value. .ne 4 2. all the following are true: a + b + c = 15 a + d + g = 15 a + e + i = 15 b + e + h = 15 c + f + i = 15 d + e + f = 15 g + e + c = 15 g + h + i = 15 You can search through all possible ways of assigning the 9 integers to the 9 letters, i.e 9 factorial or 362880 combinations. However, a little representational and conceptual creativity can almost completely eliminate the searching required. The trick requires the following observations (a) The problem is equivalent to mapping the integers 1 to 9 onto locations in a 3 by 3 array of squares labelled thus .ce 3 a b c d e f g h i such that all rows, columns and diagonals add up to 15. (b) There are three distinct types of locations in the array, occurring in different numbers of collinear triples: the centre (e) occurring in four triples, corners (a, c, g, i) occurring in three middle edges (b, d, f, h) occurring in two triples. (c) So there must be three different types of integers between 1 and 9 such that some occur in four, some in three, and some in two triples adding up to 15. It takes very little work to divide the nine integers into these three categories. After that assigning them to locations in the array so as to satisfy the conditions requires almost no backtrack searching. After one solution is found all the others are generated quickly by using the symmetries that are blindingly obvious in the geometrical representation though not at all obvious in the original formulation of the problem. This example of productive laziness used the ability to notice a relationship between an arithmetical problem and a geometric structure, as a result of which some additional arithmetical concepts were created (concepts of different kinds of numbers) which can then be used to constrain the assignment of numbers to letters in such a way as to more or less eliminate searching. This toy example is typical of creative problem solving with much harder problems in mathematics and elsewhere. Often a problem looks either unsolvable, or solvable only by very tedious exhaustive search, until a relationship between domains, or between problems, is noticed that provides a short cut to a solution. Sometimes discovering such relationships is just a matter of luck. Sometimes it results from a generalisation or modification of a similar relationship that has already been shown to have heuristic power. However, for many kinds of problems short cuts work if you are lucky but they are not perfect. Moreover, most good short cuts cannot be worked out in advance on the basis of prior analysis of the domain: they have to be discovered by exploration and analysis, like the way in which a great chess player (or mathematician) learns patterns by observing and generalising what turns up in actual experience instead of trying to do everything in advance by applying general inference procedures to the rules of chess (or the axioms of a branch of mathematics). So even in formal and mathematical domains there are not going to be neat generally applicable powerful methods that can be used to solve all problems in a reasonable time. .sh The physical (biological, social) world is even harder to deal with ------------------------------------------------------------------- .hs The requirement to discover rules of thumb and short-cuts by exploration and experience is even more pressing in connection with problem solving in the real world (e.g. finding good ways to build houses that are resistant to wind, rain, cold, etc.). There are several reasons why things are harder in the real world, including the following: a. Unlike a formal system like chess or number theory, no complete body of information is available as a starting point: knowledge is always limited. (In fact we are generally both ignorant about some things and misinformed about others.) b. The range of possible things to do, and therefore the branching factor in search spaces will be considerably higher than in chess or arithmetic: you can scratch your ear, shout for help, look for a library, try to climb a tree, think of a number, try to remember whether you have been here before, etc., etc. So search spaces are unmanageably large, especially if the task is to find all the solutions, or to find the best one. c. Things happen at speed in the world, and very often decisions and actions are required by a deadline in the immediate future. This rules out extensive combinatorial searches. For all these reasons rapidly accessible, rapidly executable rules of thumb have to be derived or learnt by trial and error. Because the world is treacherous and we are not gods, such rules are highly fallible. In both the real world and formal domains mistakes are possible. A heuristic induced from actual experience can be erroneous because the wrong generalisation was made, even in a mathematical problem domain. Additional problems in the physical world are poor observation or measurements, or a poor set of concepts for dealing with the phenomena - like trying to think about movement of masses without the concept of acceleration. There are no general, provably correct, algorithms for finding powerful new notations and using them to define new concepts. Scientific and cultural developments are slow, erratic, and often painfully wrong. And this is inevitable. .sh Limits of consistency in intelligent systems -------------------------------------------- .hs Given that some of your rules, generalizations and facts may be wrong one possible (though not infallible) way of detecting that errors exist might be to check whether they are all mutually consistent. If not, something must be wrong, though consistency does not imply that everything is fine. Alas, consistency checking is itself subject to combinatorial explosions. Even if all complexity in the knowledge base is due to the use of propositional connectives, and no quantifiers are involved, if there are N atomic propositions a consistency check can take up to 2**N tests, which will be an impossibly large number for a realistic database. Of course, particular cases can be dealt with more quickly, but in general checking consistency in a large knowledge base is an intractable problem. Moreover, even if you've managed to achieve internal consistency, maintaining it is a new problem if your beliefs are error prone, and you frequently have to modify beliefs. Because the knowledge base has to be redundant for reasons mentioned previously, retracting a belief raises the problem of finding and dealing with other redundant beliefs, plans, rules, etc. that were derived from the one that has been rejected. You could in principle embed your knowledge base in a reason maintenance system that kept records of all such dependencies, telling you what else must be retracted as a consequence of correcting errors, but again in practice this strategy will be defeated by the sheer volume of dependency relations, in any realistic knowledge base. So the only .ul practical .ne strategy on revising a belief is to use whatever heuristics are available for finding closely related beliefs that also need to be revised, and then hope that if there are any other beliefs that were derived from the erroneous one you will later independently detect their wrongness and correct them before they lead you into trouble. In practice we don't always avoid such trouble, and the arguments presented here suggest that that is not just because human beings are badly designed but because the problem will inhere in any resource-limited system. .sh Scruffy semantics ----------------- .hs There is an additional kind of scruffiness that I think is inevitable in intelligent systems inhabiting a rich, and only partly knowable, world: namely scruffy semantics. Logicians like to define formalisms whose syntax and semantics are specified precisely. A definition of the syntax of a formalism starts by assuming some class of structures (e.g. all possible sequences of characters in some alphabet, or all possible networks made of certain kinds of nodes and links) and then giving a rigorous definition of a subset of such structures that are taken to be syntactically well formed (in the formalism being defined). Any structure will either be definitely well formed or not. Human beings often deploy representational formalisms that are open-ended in ways that are incompatible with this conception of syntax. For example, maps are widely used, but there is no set of rules specifying exactly what is and what is not a legal map. It is up to someone creating a map to decide what will be an effective device for the intended purpose, assuming that the eventual user will be creative and intelligent. As a result there is considerable variation in road maps published by different firms. Some of them use considerable ingenuity in inventing new encodings of information, for example whether a link between a motorway and a lesser road allows both entry to and exit from the motorway, or only one of these -- a problem that used not to arise for more primitive road systems. Sometimes the conventions used in a map are explicitly specified in a key, but sometimes it is up to the reader to infer them, using general knowledge of the domain and the purposes of the map. This kind of open-endedness applies to all kinds of pictures, diagrams, charts, tables, models, etc used in human communication. The prior existence of some agreed definition precisely specifying a syntax is the exception rather than the rule. Moreover, if we were not allowed this syntactic creativity in our communicative devices we would be impoverished not only culturally but also scientifically and technically. Even mathematicians often create new diagrams and notations both when searching for proofs and when communicating concepts and proofs to others. Of course, natural languages are equally prone to creative extension. When a toddler once said to his parents "Today might be much more hotter than it usually be's", in a discussion of whether to go on a picnic, they understood him perfectly. More generally, the syntax of natural languages is constantly evolving, and evolving simultaneously in different directions in different sub-cultures. Semantic scruffiness is an even deeper issue than syntactic scruffiness. A formal semantics of the kind defined by Tarski (Tarski 1956) maps well formed expressions in a formalism onto elements or subsets of a previously specified set of objects in a such a way that whether something is or is not so mapped is determined by a collection of recursive rules. In real life we are not in a position to specify precisely which sets of objects we are dealing with nor the mapping rules. Instead we muddle on with a collection of relatively ill-specified notations, and employ structures using such notations with a collection of ill specified semantic rules. In particular it is not generally the case that for human language the question whether a situation is or is not correctly described by a sentence has a determinate answer. "That is a giraffe" might be indeterminate when a new kind of animal is discovered that has some of the features previously regarded as essential to being a giraffe, but not all of them. Many scientific advances depend on the introduction of new concepts that are later found to have semantic indeterminateness, for instance the concept of mass in newtonian mechanics. When the indeterminateness is discovered, e.g. because an unanticipated situation arises in which defining criteria generate conflicts, this often catalyses conceptual changes that enable us to extend our concepts and theories to provide us with a deeper more powerful understanding of the world. A full discussion of this issue is impossible in the space available here. I'll simply baldly assert that for creative intelligent agents groping towards ever increasing knowledge and understanding of an indefinitely rich and complex world it is inevitable that the notations will have both syntactic and semantic indeterminacy of a kind that both limits the applicability of logical and mathematical methods but also provides a stimulus to creative advance. .sh So various kinds of scruffiness are inevitable ---------------------------------------------- .hs If all this is correct, then over the years any intelligent learner in the real world, as opposed to some ideal world in the mind of wishful logicians, will build up an ever increasing store of ill-defined information, much of it inevitably redundant, much of it inevitably wrong, and with no hope of keeping it all internally consistent. I.e. real intelligent agents will be irretrievably scruffy. I have argued elsewhere (Sloman 1987) that there will be additional kinds of scruffiness to do with the requirements of control systems in an agent that has multiple independent sources of motivation that can be triggered by new events in the world, and has limited information and limited time for dealing with most of the problems. A conclusion of that argument is that the kinds of mechanisms that sometimes generate emotional states are not design defects but inevitable consequences of optimising the design, subject to all the problems and constraints of real life. .sh What should AI do about this? ----------------------------- .hs One reaction to all this is horrified rejection and a frenzied search for some general way of using formal, mathematical methods to avoid the problems. An alternative reaction is to recognize that a major task for AI is to find ways of designing intelligent agents that reduce, even though they cannot eliminate, the bad consequences of all this scruffiness. This is compatible with trying to identify classes of sub-problems that are completely amenable to rigorous mathematical treatment. Unfortunately, because we don't know enough about the world to prove that any particular way of minimizing the consequences is optimal, and I suspect that there is NO practically implementable strategy that will be optimal for ALL situations, it seems to me that our best hope is to see what has emerged as a result of millions of years of empirical exploration of the space of possible designs. I.e. we should study existing animals, including especially human beings, to find out how they cope reasonably well in a rich, partly knowable, partly friendly, fast changing environment. It is unlikely that they do this by proving theorems, even theorems in non-monotonic logics. Of course, I am not arguing that mathematics and logic are irrelevant to AI. The whole history of science has shown that on the contrary a mathematical approach is often essential for understanding a complex system, and for many aspects of AI it is clear that mathematical analyses of problems and techniques plays an essential role both in understanding the nature of the problem to be solved and in designing and evaluating representations and algorithms. My point is only that precisely defined representations and rigorous fully analysed methods have to be restricted to those sub-problems where they are applicable and useful (e.g. aspects of low level vision, design and analysis of various forms of neural computation, some aspects of motor control, complexity analysis, and so on) and that for many features of the design of intelligent systems the .ul working .ne system will have to violate canons of mathematical and logical acceptability, but for good, principled, reasons. .sh Conclusion ---------- .hs Although it may be impossible to build truly intelligent systems that never become scruffy as they struggle to understand the world and how to deal with it, at least we have a neat theory as to why it's impossible.