|
The prisoner's dilemma is a non-zero-sum game that illustrates a conflict between what seems a rational individual behavior
and the benefits of cooperation, in certain situations where short-term gains produce later wrongs. The essence of the problem is
that each individual player is trying to maximise his own advantage, without concern for the well-being of others — i.e.,
he is an 'egoist' — in a situation where nothing is known of the other player
except that they are likely to be doing the same. The dilemma illustrates the paradox that if everybody were to behave in such a
way, they would risk actually ending up with less than they might have obtained had they chosen to cooperate with each
other.
The key problem, then, arising from the prisoner's dilemma, is whether or not cooperation can evolve among egoists: that is,
can self-interested individuals come to learn, over time, that their interests are better-served by cooperating with those around
them, rather than by solely pursuing their own advantage without concern for others.
The classical prisoner's dilemma
The classical prisoner's dilemma (PD) is as follows:
- Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and having separated them,
visit each of them and offer the same deal: If you confess and your accomplice remains silent, he gets the full 10-year sentence
and you go free. If he confesses and you remain silent, you get the full 10-year sentence and he goes free. If you both stay
silent, all we can do is give you both 6 months for a minor charge. If you both confess, you each get 5 years.
which can be summarized as:
|
You Deny |
You Confess |
| He Denies |
Both serve six months |
He serves ten years; you go free |
| He Confesses |
He goes free; you serve ten years |
Both serve five years |
Let's assume both prisoners are completely selfish and their only goal is to minimize their own jail term. As a prisoner you
have two options: to cooperate with your accomplice and stay quiet, or to defect and betray your accomplice and confess. The
outcome of each choice depends on the choice of your accomplice; unfortunately, however, you don't know the choice of your
accomplice. Even if you were able to talk to him, you couldn't be sure whether to trust him.
If you expect your accomplice will choose to cooperate and stay quiet, the optimal choice for you would be to defect, as this
means you get to go free immediately, while your accomplice lingers in jail for 10 years. If you expect your accomplice will
choose to defect, your best choice is to defect as well, since then at least you can be spared the full 10 years serving time and
have to sit out 5 years, while your accomplice does the same. If however you both decide to cooperate and stay quiet, you would
both be able to get out in 6 months.
The naively selfish reasoning that it is better for you to defect and confess is flawed; both would end up in jail for 5
years. Even sophisticated selfish reasoning does not get you out of this dilemma, however; if you count on your accomplice to
cooperate it would be optimal for you to defect. However, if your accomplice knows this and thus would defect as well, it would
be best if you'd both cooperated. And so on. This is the core of dilemma.
If reasoned from the perspective of the optimal interest of the group (of two prisoners), the correct outcome would be for
both prisoners to cooperate, as this would reduce the total jail time served by the group to one year total. Any other decision
would be worse. In a situation with a payoff matrix like the prisoner's dilemma individual selfish decisions are not
automatically best if viewed from the perspective of the group as a whole.
Another explanation
The cognitive scientist Douglas Hofstadter (see References,
below) once suggested that people often find the PD problem easier to understand when it is illustrated in the form of a simple
game, or trade-off. The example he used was two people meeting and exchanging closed bags, with the understanding that one of
them contains money, and the other contains an item being bought. Either player can choose to honor the deal by putting into his
bag what he agreed, or he can defect by handing over an empty bag. This exemplifies the PD: for each participant individually, it
is better to defect, though both would be better off if both cooperated.
In the same article, Hofstadter also observed that the PD payoff matrix can, in fact, be written in a variety of ways, as long
as it conforms to the following principle:
where T is the temptation to defect (ie, what you get when you defect and the other player cooperates); R is the reward for
mutual cooperation; P is the punishment for mutual defection; and S is the sucker's payoff (ie, what you get when you
cooperate and the other player defects). The above formula, then, ensures that, whatever the precise numbers in each part of the
payoff matrix, it is always 'better' for each player to defect regardless of what the other does. Following this principle, and
simplifying the PD to the above 'bag switching' scenario (or an Axelrod-type two player game, see below), we get the following
'canonical' PD payoff matrix — that is, the one that is normally shown in literature on the subject:
|
Cooperate |
Defect |
| Cooperate |
3, 3 |
0, 5 |
| Defect |
5, 0 |
1, 1 |
Real-life examples
These particular examples, involving prisoners and bag switching and so forth, may seem contrived, but there are in fact many
examples in human interaction as well as interactions in nature which have the same payoff matrix. The prisoner's dilemma is
therefore of interest to the social sciences such as economics, politics and sociology, as well as to the biological sciences
such as ethology and evolutionary biology.
In political science, for instance, the PD scenario is often
used to illustrate the problem of two states engaged in an arms race. Both will
reason that they have two options, either to increase military expenditure or to make an agreement to reduce weapons. Neither
state can be certain that the other one will keep to such an agreement; therefore, they both incline towards military expansion.
The irony is that both states seem to act rationally, but the result is completely
irrational.
Another example would be hoarding supplies of an essential item during a shortage. Let's say that all our tap water gets
poisoned, somehow, and everyone has to rely on bottled water from supermarkets. Rationally, each person knows that they should
limit their purchases of bottled water for the period of the shortage (ie, they should 'co-operate'), because if everybody rushes
to the supermarket and stocks up on water (ie, if they 'defect') supplies will quickly run out — so that, in the long-run,
there will be nothing left for anyone. However, each person also fears that hoarding is precisely what everybody else will be
doing; therefore, rationally, they know that if they are to be sure of securing any supply of water at all they had better go and
stock up too.
This scenario fits the PD payoff matrix outlined above: defection when others co-operate (T) means you can keep on
getting a generous supply of drinking water repeatedly, because others are restricting their consumption; mutual
co-operation (R) would bring the reward of a moderate amount of drinking water for everyone, over an extended period;
mutual defection (P) would mean that everyone gets a lot to start off with but they probably all die of thirst soon
enough thereafter; co-operation when others defect (S) means you end up with hardly any water, because it has all been
snapped up by other people.
The iterated prisoner's dilemma
In his book The Evolution of
Cooperation (1984), Robert
Axelrod explored an extension to the classical PD scenario, which he called the iterated prisoner's dilemma (IPD).
In this, participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod
invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that
were entered varied widely in algorithmic complexity; initial hostility; capacity for forgiveness; and so forth.
Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different
strategies, "greedy" strategies tended to do very poorly in the long run while more "altruistic" strategies did better, as judged purely by self-interest. He used this to show a possible mechanism to
explain what had previously been a difficult hole in Darwinian
theory: how can seemingly altruistic behavior evolve from the purely selfish mechanisms of natural selection?
The best deterministic strategy was found to be "Tit for Tat", which
Anatol Rapoport developed
and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the
contest. The strategy is simply to cooperate on the first iteration of the game; after that, do what your opponent did on the
previous move. A slightly better strategy is "Tit for Tat with forgiveness". When your opponent defects, on the next move you
sometimes cooperate anyway with small probability (around 1%-5%). This allows for occasional recovery from getting trapped in a
cycle of defections. The exact probability depends on the lineup of opponents. "Tit for Tat with forgiveness" is best when
miscommunication is introduced to the game. That means that sometimes your move is incorrectly reported to your opponent: you
cooperate but your opponent hears that you defected.
Tit for Tat was successful, Axelrod argued, for two main reasons. Firstly, it is 'nice': that is, it starts off cooperating
and only defects in response to another player's defection, so it is never responsible for initiating a cycle of mutual
defections. Secondly, it is provocable, always responding to what the other player does; it punishes another player immediately
when they defect, but equally immediately it responds in kind if they start to cooperate again. Such clear, straightforward
behaviour means that the other player can easily understand the logic behind Tit for Tat's actions, and can therefore figure out
how to work alongside it productively. It is no coincidence, incidentally, that most of the worst-performing strategies in
Axelrod's tournament were ones that were not designed to be responsive to the other player's choices; against such a player, the
best strategy is simply to defect every time, because you can never be sure of establishing reliable mutual cooperation.
For the iterated PD, it is not always correct to say that any given strategy is the best. For example, consider a population
where everyone defects every time, except for a single individual following the Tit-for-Tat strategy. That individual is at a
slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to
defect every time. In a population with a certain percentage of always-defectors and the rest being Tit-for-Tat players, the
optimal strategy for an individual depends on the percentage, and on the length of the game. Simulations of populations have been
done, where individuals with low scores die off, and those with high scores reproduce. The mix of algorithms in the final
population generally depends on the mix in the initial population.
If an iterated PD is going to be iterated exactly N times, for some known constant N, then there is another interesting fact.
The Nash equilibrium is to defect every time. That is easily proved
by induction. You might as well defect on the last turn, since your opponent will not have a chance to punish you. Therefore, you
will both defect on the last turn. Then, you might as well defect on the second-to-last turn, since your opponent will defect on
the last no matter what you do. And so on. For cooperation to remain appealing, then, the future must be indeterminate for both
players. One solution is to make the total number of turns N random.
Another odd case is "play forever" prisoner's dilemma. The game is repeated infinitely many times, and your score is the
average (suitably computed).
The prisoner's dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that
transactions between two people requiring trust can be modelled by the PD, cooperative behavior in populations may be modelled by
a multi-player, iterated, version of the game. It has, consequently, fascinated many, many scholars over the years. A not
entirely up-to-date estimate (Grofman and Pool, 1975) puts the count of scholarly articles devoted to it at over 2,000.
Variants
There are also some variants of the game, with subtle but important differences in the payoff matrices, which are listed
below:-
Chicken
Another important non zero-sum game type is called "Chicken". In Chicken if your opponent cooperates, you are better off to
defect - this is your best possible outcome. If your opponent defects, you are better off to cooperate. Mutual defection is the
worst possible outcome (hence an unstable equilibrium), but in the Prisoner's Dilemma the worst possible outcome is cooperating
while the other person defects (so both defecting is a stable equilibrium). In both games, "both cooperate" is an unstable
equilibrium.
A typical payoff matrix would read:
- If both players cooperate, each gets +5.
- If one cooperates and the other defects, the first gets +1 and the other gets +10.
- If both defect, each gets -20.
Chicken is named after the car racing game. Two cars drive towards each other for an apparent head-on collision - the first to
swerve out of the way is "chicken". Both players can swerve to avoid the crash (cooperate) or keep going (defect). Another
example often given is that of two farmers who use the same irrigation system for their fields. The system can be adequately
maintained by one person, but both farmers gain equal benefit from it. If one farmer does not do his share of maintenance, it is
still in the other farmer's interests to do so, because he will be benefiting whatever the other one does. Therefore, if one
farmer can establish himself as the dominant defector - ie, if the habit becomes ingrained that the other one does all the
maintenance work - he will be likely to continue to do so.
Assurance game
An Assurance Game has a similar structure to the prisoner's dilemma, except that the rewards for mutual co-operation are
higher than those for defection. A typical pay-off matrix would read:
- If both players cooperate, each gets +10.
- If you cooperate and the other player defects, you get +1 and she gets +5.
- If both defect, each gets +3.
The Assurance Game is potentially very stable because it always gives the highest rewards to players who establish a habit of
mutual co-operation. However, there is still the problem that the players might not realise that it is in their interests to
co-operate. They might, for example, mistakenly believe that they are playing a Prisoner's Dilemma or Chicken game, and arrange
their strategies accordingly.
Friend or foe
Friend or Foe is a game show airing currently on the Game
Show Network. It is an example of the prisoner's dilemma game tested by real people, but in an artificial setting. On the
game show, three pairs of people compete. As each pair is eliminated, they play a game of Prisoner's Dilemma to determine how
their winnings are split. If they both cooperate ("Friend"), they share the winnings 50-50. If one cooperates and the other
defects ("Foe"), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing.
Notice that the payoff matrix is slightly different from the standard one given above, as the payouts for the "both defect" and
the "I cooperate and opponent defects" cases are identical. This makes the "both defect" a neutral equilibrium, compared with
being a stable equilibrium in standard prisoner's dilemma. If you know your opponent is going to vote "Foe", then your choice
does not affect your winnings. In a certain sense, "Friend or Foe" is between "Prisoner's Dilemma" and "Chicken".
The payoff matrix is
- If both players cooperate, each gets +1.
- If both defect, each gets 0.
- If you cooperate and the other person defects, you get +0 and she gets +2.
Friend or Foe would be useful for someone who wanted to do a real-life analysis of prisoner's dilemma. Notice that you only
get to play once, so all the issues involving repeated playing are not present and a "tit for tat" strategy cannot develop.
In Friend or Foe, each player is allowed to make a statement to convince the other of his friend-ishness before both make the
secret decision to cooperate or defect. One possible way to 'beat the system' would be for a player to tell his rival, "I am
going to choose foe. If you trust me to split the winnings with you later, choose friend. Otherwise, if you choose foe, we both
walk away with nothing." A greedier version of this would be "I am going to choose foe. I am going to give you X%, and I'll take
(100-X)% of the total prize package. So, take it or leave it, we both get something or we both get nothing." Now, the trick is to
minimize X such that the other contestant will still choose friend. Basically, you have to know the threshold at which the
utility he gets from watching you get nothing exceeds the utility he gets from the money he stands to win if he just went
along.
This approach has not yet been tried in the game; it's possible that the judges might not allow it.
References
- Axelrod, Robert and Hamilton, William D. (1981). "The Evolution of Cooperation". Science, 211:1390–1396.
- Axelrod, Robert (1984). The Evolution of Cooperation
- Grofman and Pool (1975). "Bayesian Models for Iterated Prisoner's Dilemma Games". General Systems
20:185–94.
- Hofstadter, Douglas R. (1985) The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation Ch.29
from Metamagical Themas: questing for the essence of mind
and pattern (ISBN
0465045669)
- Poundstone, William (1992). Prisoner's Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb.
Doubleday. ISBN 0385415672. A
wide-ranging popular introduction, as the title indicates.
See also
External link
- Play the iterated prisoner's dilemma online
|