Prisoners of Reason

Manfred Kerber

School of Computer Science, The University of Birmingham
Edgbaston, Birmingham B15 2TT, England
http://www.cs.bham.ac.uk/~mmk

ABSTRACT

In this contribution, the prisoner's dilemma is revisited and taken as an example par excellence for the problems with reductionist reasoning. It is related to recent findings in evolutionary psychology and also set into the broad context of critiques against rationalism as such. It will be argued that rationality goes well beyond traditional logic and reductionist reasoning based on it. Complex decision making in a complex world is very much a matter of learning and adaptation, but also of understanding things in their context considering their interconnections. Reasoning has to reflect that in order not to run into severe problems. Since everyday reasoning is not independent of the context in which it takes place, the rules of reasoning can be valid only with respect to a particular domain and scenario.
The paper shows first steps towards a logic that is based on evolutionary methods. A good solution in the prisoner's dilemma is one that can be reused in the context of the iterated prisoner's dilemma, that is, that would be reused if faced with the same situation again.

1.  Introduction

One of the aims in artificial intelligence is to build rational agents with the purpose of acting in a rational way, but also to shed light on our rationality. Rational behaviour means in this context to maximise the outcome of actions in a certain situation by making the best possible decisions. A fundamental problem in making the best possible decisions is described by the so-called prisoner's dilemma, since it describes a situation in which rational behaviour seems to result in a sub-optimal outcome, while "irrational" behaviour seems to produce better results.

Following a more fundamental viewpoint and looking at the global self-generated problems that the human race has to face (hunger, greenhouse effect, unemployment, erosion, overpopulation, and many more) there has been expressed a fundamental critique against rationality altogether, in which rational behaviour is made responsible for these problems. One of the most prominent books against the rational view of the world is Capra's "The Turning Point" [Capra82], in which Capra gives evidence that rational behaviour is responsible for these problems. Capra argues that in a new ecological way of thinking, rational knowledge has to be supplemented by intuitive wisdom. I agree with a large part of Capra's analysis of the problems we are faced with and also with part of the analysis why we have these problems. However, Capra comes to very radical conclusions, namely that rationality as such is the culprit and that it has to be replaced (or supplemented) by something else, namely intuition. For many years I was very puzzled by this conclusion, not only since Capra does not offer a clear view what intuitive wisdom should be and how it should be balanced with rational thought, but in particular since I think that rationality is one of the big achievements of civilisation. Going back to a pre-rational way of dealing with serious matters can be extremely dangerous and seriously aggravate rather than solve our problems.

Rational behaviour can be rather complex and it is not clear how to look at it in full generality. The usual scientific approach in such a situation is to study a much simplified scenario. The prisoner's dilemma is of such a kind. It can be viewed as a simple situation in which the phenomena can be studied (in a rational way).Of course this methodology can be attacked by people who do not believe in the traditional scientific method. On the other hand following [Kuhn62], a new scientific paradigm is not necessary, as long as the phenomena can be explained in the old.

2.  The prisoner's dilemma

An excellent and fascinating description of the prisoner's dilemma, its history and its implications, is given in the book of [Poundstone92]. The following version of the dilemma is taken from this book (p.118):

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don't have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. If he testifies against his partner, he will go free while the partner will get three years in prison on the main charge. Oh, yes, there is a catch ... If both prisoners testify against each other, both will be sentenced to two years in jail.

Let us assume two agents A and B and label by "Coop" that an agent cooperates with the other agent (and not with the police) and by "Defect" that he/she/it (he for short in the following) testifies against the other. If pairs like (-a,-b) mean that agent A has to go to prison for a years and B for b years, the situation can be summarised as shown in Figure 1:

  B
A     Coop   Defect
  Coop   (-1,-1)   (-3,0)
  Defect   (0,-3)   (-2,-2)

1. Payoff table for the prisoner's dilemma

There is a problem with this setting insofar as the agents might reason as follows (let's do the reasoning for agent A):
"There are two possibilities of what agent B might do.

First case, B cooperates. If I cooperate as well, I have to go to prison for one year, if I defect I go off free. Hence in this case I am better off to defect.

Second case, B defects. If I cooperate I have to go to prison for three years; if I defect, however, I have to go to prison for two years only. That is, in this case I am better off to defect.

Hence, in any case I am better off to defect, so I do defect.

Of course, B can make exactly the same reasoning. As rational agents they would both defect and end up with two years in prison each. Wouldn't it have been better to cooperate and to go to prison for one year only? What's wrong with the reasoning above?

Capra might answer: "The problem is with the rational kind of reasoning. Intuition would tell the agents they should cooperate rather than defect." But how would intuition tell them?

In game theory as developed by [Neumann44] situations like the one of the prisoner's dilemma are described by matrices of the kind above. An agent acts rationally if he tries to find an equilibrium point, that is, a minimax point in the matrix. An agent tries to maximise his own outcome under the assumption that his opponent tries to do so for his outcome as well. As described by [Rapoport66] the different games (of two players) can be classified into 78 different classes, which can be represented by one case consisting of the smallest possible positive (alike of the smallest possible non-negative) numbers in that category.

The prisoners dilemma can be rerepresented as shown in Figure 2:

  B
A     Coop   Defect
  Coop   (2,2)   (0,3)
  Defect   (3,0)   (1,1)

2. Normalised payoff table for the prisoner's dilemma

The reasoning above tells to select the "defect" strategy.

On a first view the prisoner's dilemma and related games seem to have very restricted relevance only, since only a small proportion of the population belongs to criminal gangs and even only a very small proportion of those will ever be faced with such a decision. On a second view, however, the application range of the dilemma is much much wider, since different questions are directly linked to it. For instance, should a particular individual wish that taxes are raised in order to provide children of other people with a better education? Or to speak in a more general manner, should the rich and the powerful wish that there is a functioning welfare system in society that supports the poor and the weak? The simple answer is, no, since the reasoning above tells to "defect", to be rational means to be selfish and to follow only their own interests (while emotionally the answer may be to show sympathy). However, thinking twice, or three times, may lead to a different rational answer; living together with educated people seems to be a better option than to live together with uneducated people; living in a just and stable society more attractive even for the rich than in a bipartite society of extremely rich people and people which are so poor that they don't have anything to lose. The answer to the prisoner's dilemma has immediate effects to our views how we wish our society should develop. The general slogan of most of the major parties in Western countries "the rich must become richer so that the poor have enough bones that fall under the table" is not necessarily the best possible way forward.

  B
A
    cccc   cccd   ccdc   cdcc   dccc   ccdd   cdcd   dccd   cddc   dcdc   ddcc   dddc   ddcd   dcdd   cddd   dddd
  Coop   (2,2)   (2,2)   (2,2)   (2,2)   (0,3)   (2,2)   (2,2)   (0,3)   (2,2)   (0,3)   (0,3)   (0,3)   (0,3)   (0,3)   (2,2)   (0,3)
  as B   (2,2)   (2,2)   (2,2)   (1,1)   (2,2)   (2,2)   (1,1)   (2,2)   (1,1)   (2,2)   (1,1)   (1,1)   (1,1)   (2,2)   (1,1)   (1,1)
  opp.B   (3,0)   (3,0)   (0,3)   (3,0)   (3,0)   (0,3)   (3,0)   (3,0)   (0,3)   (0,3)   (3,0)   (0,3)   (3,0)   (0,3)   (0,3)   (0,3)
  Defect   (3,0)   (1,1)   (3,0)   (3,0)   (3,0)   (1,1)   (1,1)   (1,1)   (3,0)   (3,0)   (3,0)   (3,0)   (1,1)   (1,1)   (1,1)   (1,1)

3. Meta-game constructed from the prisoner's dilemma by Howard and Rapoport

The problem of sub-optimality in the case described above has been well-studied in game theory and Howard has developed a method how rational reasoning can lead to cooperation rather than defection [Howard66a,Howard66]. Abstractly Howard's argument can be summarised as "think twice" or even better "think three times" before you make a serious decision. Howard introduces so-called meta-games, in which the decision is not made as a case analysis of what the opponent does, but dependent on the strategy the opponent follows (Anatol Rapoport gives an easily understandable summary of the results in [Rapoport67], which is summarised in the next paragraph).

If B either cooperates or defects, A can - if he thinks twice - follow four different strategies: firstly, to cooperate (whatever B does), secondly, to defect (whatever B does), thirdly, to cooperate if and only if B does so, and, fourthly, to defect if and only if A cooperates. This view as such does not solve the dilemma. If B, however, reconsidering his decision, looks at the possibilities he has in dependency on the four strategies A can follow, there are 16 possible conditional strategies which B can follow: cooperate regardless of what A does (indicated as cccc), defect in any case (dddd), cdcc as defect if and only if A tries to match his choice, and so on. The matrix resulting from this meta-game is presented in Figure 3. There are three equilibria in this matrix, they describe three solutions: the old one, both defect, whatever the other one does. In the other two A chooses to do the same as B and B has the choice ccdd or dcdd. dcdd is the preferable one of all three strategies, since it gives a higher payoff than dddd and is better than ccdd because of its better payoff if A should select to cooperate whatever B does. That is, cooperating if and only if the opponent tries to match the own choice is the rational thing to do and reconciles individual and collective rationality.

As convincing as the solution is, there is a problem with it as well. What is about the original reasoning by case analysis which resulted in mutual defection. The solution by this meta-game is of course only a valid one if agent A thinks twice and agent B three times (or B twice and A three times). Isn't it still an option for a rational agent, in particular for those which do not know Howard's solution and/or do not have the level of sophistication to make such a difficult reasoning process to just defect? Let us assume an agent A which makes the more simple minded form of reasoning and decides to defect is confronted with an agent B who follows Howard's solution to follow dcdd. Since A and B can't communicate B has to make assumptions about A's behaviour. In Howard's scenario the assumption would be that A chooses to try to match B's behaviour. In reality A would defect, however. That is, A defects and B cooperates in the end. Rationality seems not to dictate a particular behaviour.

That there is not an ultimate answer to the problem becomes also apparent when we look at real human behaviour in such a situation, all forms of behaviour do occur. There are cases were both agents defect, both cooperate, and one cooperates and one defects.

3.  The iterated prisoner's dilemma

A problem related to the prisoner's dilemma is the iterated prisoner's dilemma, where two agents A and B meet each other for a sequence of events and are faced each single round with the decision whether they should cooperate or defect. Each agent can make his decision dependent on the previous experience. The payoff of a single decision is not so important any more, but the payoff over the sequence of rounds.

Although the iterated prisoner's dilemma is a different game, it can be connected to the original game by the following line of argument. If we assume a situation in which the iterated game consists of a fixed number n of rounds, no player can make use of the very last round for future roundsThere is a variant to the iterated prisoner's dilemma in which the rounds are not determined in advance. This adds additional uncertainty to the situation, "you never know when you need me in the future ...". In such a scenario the argument would not hold since there is no round, of which it is known in advance that it is the last one.. That is, in the n-th round rational agents behave just as in the basic version of the game. If we assume the first, simple minded, line of reasoning for the basic game, that means in the n-th round both players defect. Since they defect in the n-th round whatever has happened before, the last round in which a real decision has to be made is the (n-1)-st round. Since nothing is learned from this round for future rounds either, in this round simple minded rational agents behave just as in the basic version of the game as well, that is, they defect. Inductively we get, they always defect. Of course, this argument needs a certain level of sophistication of their reasoning.

This is, however, not the best result the agents can achieve. With the rewards displayed in Figure 2 they can achieve in n rounds each a gain of 1* n=n if they always defect compared to 2* n if they always cooperate. That is, the reasoning results in a gross under-performance.

[Axelrod84] organised a couple of tournaments to which different algorithms could be submitted and which had to play against each other. The highest score reached Rapoport's submission, tit-for-tat. Tit-for-tat is defined as: cooperate in the first round, in all the following rounds do whatever the other player did on the previous round.

Also in following tournaments tit-for-tat scored very well. It seems hard to improve on it and the only problem with it seems to be an echo effect if it meets an almost tit-for-tat that behaves as tit-for-tat but starts with defection rather than cooperation.

Experiments about the evolution of such strategies [Delahaye98] in an evolutionary computation environment will be briefly discussed below.

4.  "Your cheatin' heart"

In a recent article, Robin Dunbar investigates monogamy and infidelity in animals and humans [Dunbar98]. Dunbar writes: "Humans are caught in the same bind as any other monogamous species. The male wants to monopolise his mate's future reproductive output, but he has to tread a careful line. Mating is ultimately a game of cooperation rather than coercion: too aggressive a policing strategy may well drive the female away ... females spurn their attentions in favour of socially more skillful males. By the same token, the male's response to suspicions of cuckoldry should not necessarily be outrage. Although a male risks rearing children unrelated to him, he should continue to treat all his partner's children as his own so long as doing so allows him to ... gain access to most of her future reproduction."Of course, as always the story may be much more complicated in a real world scenario. The motivations of animals (and above all of human beings) are much more complex and can probably not be reduced to a single source like producing as many offsprings as possible. In the sequel Dunbar describes the benefits males and females can expect from adultery. For males it is easy to see, for them it is a cheap way of producing additional offsprings without having to care for them. For a female the situation is a bit more tricky. She needs a partner who supports her in bringing up the brood ("a man with a bulging wallet, perhaps, or a robin with a large breeding territory."), "but she also wants a mate with good genes, a quality which she might assess by looking at his tail if she is a peahen, or by the symmetry of his features if she is a woman. But females usually have to trade one component off against another because ... few males come with high ratings on all dimensions."

Only modern genetic analysis made it possible to find out to which extent birds are faithful to their partner. It turned out that a fifth of the eggs produced by monogamous female birds had not been sired by their regular partners. Alike 15 per cent of children are fathered by a male who is not their registered father.

If we look at the different forms of behaviour, we can - as Dunban did - interpret reason into it, but of course one can strongly doubt that birds (and even humans) make any explicit reasoning of the kind described in the previous paragraphs for deciding on their actions (there are people who would say "It's all chemistry."). To take a phrase from [Brooks91], there may well be "intelligence without reason" in this behaviour. Again in this scenario it is very difficult to say what the best possible behaviour is, in particular, in view of all the uncertainties about the consequences of a particular behaviour; and again different behaviours do actually occur.

5.  Complex decisions

Traditionally logic has been developed with two different main goals, firstly to formalise mathematical reasoning and secondly to formalise everyday reasoning. Up to recent years, when applications of logic in artificial intelligence led to a dramatic increase in logical formalisms there has been hope that reasoning as such could be captured by a single formalism.Peano [Heijenoort67, p.86] says 1889, for instance, "I think that the propositions of any science can be expressed by these signs of logic alone, provided we add signs representing the objects of that science." The rapid development in knowledge representation formalisms and logical formalisms raises doubts, however, whether this is possible indeed.

[Doerner89] describes different cases in which reductionism leads to unwanted consequences in great detail. One of the key examples in his book "The Logic of Failure", is the sad fate of "Tanaland", a fictive East African country [Doerner89, p.22-32]. The inhabitants of Tanaland make their living in beef and sheep. In the computer model wild animals and a limited amount of water as well as farmland (for planting crops and fruit) are represented. Most of the region is steppe. Dörner describes an experiment in which they hand over the fate of the (computer-simulated) population to test subjects, which have dictatorial powers: they can control hunting, introduce farming, build dams, electrify the region, invest in the medical system, buy tractors ... Many test persons start addressing the poor medical system with high infant mortality and poor life expectancy. Most of them follow the rule "If we put money into the medical system the life expectancy can be improved." and indeed this seems to be true - initially at least. The improvement of the medical system leads to an increase in the population, as a consequence more food needs to be produced. By increasing the number of cattle, this can be solved. However, in most test sessions at a certain point there is so much cattle that the animals eat not just the grass but also the roots. As a consequence the steppe turns into a desert, first most of the cattle dies and then most of the population. As Dörner discusses the catastrophe occurs because the test persons narrowly concentrate on singular aspects (like the medical system), but loose the view for the whole and do not build up a model of the dynamical system as such. In one of the real world examples, an analysis of why the fatal nuclear accident of Cernobyl happened, Dörner describes the need for a system view that goes beyond reductionism. Note that unlike Capra, Dörner does not attack rationality as such, but narrow-mindedness in form of unjustified reductionism to single causes.

Traditional logic seems to be much better-suited to deal with local reasoning than to describe and to reason with complex systems. It focuses on the question whether or not a particular formula follows from a set of formulae (gamma |= A). For instance, the payoff matrix in Figure 2 can be formalised by formulae like value(A,defects,defects,1) Rationality can be expressed by a formula like:
FORALL  aagent. FORALL  xiACTION . FORALL  zetaACTION . FORALL  aACTION . FORALL  xR. FORALL  yRselects(a,xi) & value(a,xi,a,x) & value(a,zeta,a,y) ->  x>= y that is, agents take their best possible actions. On this level, reasoning is local and the prisoner's dilemma can be reproduced. Of course, one could try to solve the problem by adding different axioms of rationality, for instance, to replace the axiom above by a variant of Kant's categorical imperative ("Act following the maxim by which you can wish that it will be general law.") This, however, is firstly a difficult axiom to deal with in a knowledge representation scheme (it requires some kind of higher-order logic to represent it). Secondly it is not clear at all, how it would interact with other axioms. Thirdly it goes beyond rationality since it has a moral aspect as well.

An alternative principle to base rationality on can be: "Act in a way that is evolutionary competitive." Evolutionary competitive is a strategy that scores well in competition with other strategies in a society. For instance, in the context of the prisoner's dilemma, tit-for-tat is evolutionary competitive (at least in coexistence with many standard strategies), since it behaves well in the iterative prisoner's dilemma: Rationality would mean that faced with the same situation again - in the future - the strategy would be used again, you do not have to change it since you regret your previous decision. In other words, put in an iterative framework, a simple application of the strategy is rational if it can be applied in the next round again. Howard's approach to the prisoner's dilemma is that A follows the strategy to do exactly what B does and B cooperates if and only if A does exactly what B does is in its manifestation - both cooperate - an instance of this principle. A and B would mutually cooperate, but only if the other does ( tit-for-tat: if you don't cooperate, I don't do so either), hence their behaviour in the one-step dilemma is rational. The advantage of cooperation is an evolutionary one.Rational action may well include a socially responsible way of acting. However, a priori rationality does not presupposes morality. Rational behaviour can be defined as behaviour that tries to maximise the global reward according to a certain reward schema. How this schema is set up and what is most important for a particular person is not a matter of rationality, however. Furthermore it should be noted that rationality may well mean that there are different incompatible value schemas and that dilemmas can be more difficult than the prisoner's dilemma (as the dilemmas in the classical Greek tragedies). In this paper this is not further considered. The approach taken seems, however, so general that it may be possible to adapt it accordingly.

Note that the evolutionary pressure can be considerably different for the generic situation of the prisoner's dilemma as given in Figure 2 and the situation given in Figure 4, in which a much stronger motivation for defection exists. If the rewards for defection/cooperation and cooperation/defection pairs are much higher than for cooperation, it would be sensible in the iterative scenario to alternate defection and cooperation to alternate the reward of 1000 with the one of 0. For the non-iterated scenario this would mean, a random choice (with a small bias towards cooperation) is the reasonable thing to do.

  B
A     Coop   Defect
  Coop   (2,2)   (0,1000)
  Defect   (1000,0)   (1,1)

4. Payoff table for the prisoner's dilemma with strong bias towards defection

Compared to the case described in Figure 4, in Figure 5 the other extreme case is given, in which there exists a much stronger motivation for cooperation, since the gain the agents get from cooperation is significantly higher than the reward in the unfavourable defect/defect situation. In an iterated scenario the agents can gain much more out of the coop/coop scenario than they have to loose in a defect/defect scenario. Iteratively they would need to get the most favourable defect/coop situation 1000 times before they can afford to end up for a single time in the defect/defect scenario (compared to always going for coop/coop). Only with an extremely stupid strategy of the opponent one can hope for such an additional gain.

  B
A     Coop   Defect
 Coop   (1000,1000)   (0,1001)
 Defect   (1001,0)   (1,1)

5. Payoff table for the prisoner's dilemma with strong bias towards cooperation

In an evolutionary scenario tit-for-tat is successful as well. As [Delahaye98] describe the picture, however, is blurred when the coexistence of more than two strategies is considered. Such a society may converge to stable situations, may contain oscillations (in form of damped or undamped oscillations as well as resonance catastrophes) or present no regular structure at all. The concrete payoff matrices can of course strongly influence what is best to be done. The same is true for the co-population. If, in the prisoner's dilemma, the society consists almost exclusively of strategies that defect, tit-for-tat is worse off than a strategy that always defects too. In such a society defecting is the best thing (the rational thing) to do.

The standard way of using traditional logic, namely to look at local arguments, is an approach that seems not to be adequate for reasoning in complex domains. An evolutionary approach to reasoning as exemplified above, seems to be much more adequate. It is, however, by no way claimed in this paper that traditional logic is inadequate for reasoning in complex domains altogether. Of course it is possible to build up a mathematical description of complex domains with all their interconnections and dependencies on top of classical first-order logic and then describe formally what is to be maximised and what forms a rational decision. Doing so requires, however, a lot of sophistication in order to design a model that adequately describes the scenario. Such a model has been designed, for instance, by Howard in the transition from the simple-minded description of the prisoners dilemma in Figures 1 and 2 to the sophisticated one in Figure 3. It can, however, be seriously doubted, whether such a sophisticated analysis of the problem and representation of the possibilities can be done by animals which are faced with a similar kind of dilemma and which do astonishingly well.

A serious problem with Howard's solution consists in the fact that the reasoning based on meta-games can be done for all payoff matrices in Figures 2, 4, and 5. While it seems to be a rational choice in the case of Figures 2 and 5, humans would normally do a different kind of reasoning in the case of Figure 4 (as detailed above), the temptation to defect is much higher, but a strategy of always defecting is not very promising either. An evolutionary approach to reasoning seems to be much more adaptable and hence more adequate than a pre-compiled line of reasoning.

6.  Conclusion

Reasoning is more complicated than local reasoning typically studied in classical logic. Neither the world we live in nor our motivations and goals are simple and reducible to single causes and effects. We have to deal with highly interconnected and complicated scenarios with highly complex motivations and goals. Nevertheless we are able to make rational decisions within our environment. Although we often do not reach the level of sophistication that would be adequate, our choices can be rationalised. Modelling human reasoning in its full complexity requires significant research. This paper makes the claim that this research can benefit from research in machine learning, artificial life, and evolutionary programming. The human reasoning capability certainly co-evolved with the evolution of the human race. The prospect to extend mathematical reasoning by different aspects (like temporal and spatial ones) in order to model human reasoning seems to be pretty limited. A logic of learning and evolution which is built on top of recent research in artificial intelligence can make a substantial contribution to our understanding of the formation of rational thought.

References


© 2000, Workshop-Proceedings of the AISB-2000 Workshop: Artificial Intelligence, Ethics and (Quasi-) Human Rights. Manfred Kerber
The URL of this page is http://www.cs.bham.ac.uk/~mmk/papers/00-AISB.html.
Also available as pdf ps.gz bib .