\documentclass[12pt]{article}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\reversemarginpar
\topmargin -1in
\oddsidemargin .25in \textheight 9.4in \textwidth 6.4in
\begin{document}
\parindent 24pt \parskip 10pt
\setcounter{page}{138}
\begin{LARGE} \begin{center}
{\bf 5 Reputation and Repeated Games with Symmetric Information}
\end{center} \end{LARGE}
23 November 2005. Eric Rasmusen, Erasmuse@indiana.edu.
Http://www.rasmusen.org.
\bigskip
\noindent
{\bf 5.1 Finitely Repeated Games and the Chainstore Paradox}
\noindent
Chapter 4 showed how to refine the concept of Nash equilibrium to find sensible
equilibria in games with moves in sequence over time, so-called dynamic games.
An important class of dynamic games is repeated games, in which players
repeatedly make the same decision in the same environment. Chapter 5 will look
at such games, in which the rules of the game remain unchanged with each
repetition and all that changes is the ``history'' which grows as time passes,
and, if the number of repetitions is finite, the approach of the end of the
game. It is also possible for asymmetry of information to change over time
in a repeated game since players' moves may convey their private
information, but Chapter 5 will confine itself to games of symmetric
information.
Section 5.1 will show the perverse unimportance of repetition for the games of
{ Entry Deterrence} and The { Prisoner's Dilemma}, a phenomenon known as
the Chainstore Paradox. Neither discounting, probabilistic end dates,
infinite repetitions, nor precommitment are satisfactory escapes from the
Chainstore Paradox. This is summarized in the Folk Theorem of Section
5.2. Section 5.2 will also discuss strategies which punish players who fail to
cooperate in a repeated game--- strategies such as the Grim Strategy, Tit-for-
Tat, and Minimax. Section 5.3 builds a framework for reputation models based
on The { Prisoner's Dilemma}, and Section 5.4 presents one particular
reputation model, the Klein-Leffler model of product quality. Section 5.5
concludes the chapter with an overlapping generations model of consumer
switching costs which uses the idea of Markov strategies to narrow down the
number of equilibria.
\noindent
{\bf The Chainstore Paradox}
\noindent
Suppose that we repeat { Entry Deterrence I} 20 times in the context of a
chainstore that is trying to deter entry into 20 markets where it has outlets.
We have seen that entry into just one market would not be deterred, but perhaps
with 20 markets the outcome is different because the chainstore would fight the
first entrant to deter the next 19.
The repeated game is much more complicated than the {\bf one-shot game}, as
the unrepeated version is called. A player's action is still to {\it Enter } or
{\it Stay Out}, to {\it Fight} or $Collude$, but his strategy is a potentially
very complicated rule telling him what action to choose depending on what
actions both players took in each of the previous periods. Even the five-round
repeated Prisoner's Dilemma has a strategy set for each player with over two
billion strategies, and the number of strategy profiles is even greater (Sugden
[1986], p. 108).
The obvious way to solve the game is from the beginning, where there is the
least past history on which to condition a strategy, but that is not the easy
way. We have to follow Kierkegaard, who said, ``Life can only be understood
backwards, but it must be lived forwards'' (Kierkegaard {1938}, p. 465). In
picking his first action, a player looks ahead to its implications for all the
future periods, so it is easiest to start by understanding the end of a multi-
period game, where the future is shortest.
Consider the situation in which 19 markets have already been invaded (and
maybe the chainstore fought, or maybe not). In the last market, the subgame in
which the two players find themselves is identical to the one-shot { Entry
Deterrence I}, so the entrant will $Enter$ and the chainstore will {\it
Collude}, regardless of the past history of the game. Next, consider the next-
to-last market. The chainstore can gain nothing from building a reputation for
ferocity, because it is common knowledge that he will {\it Collude} with the
last entrant anyway. So he might as well {\it Collude} in the 19th market. But
we can say the same of the 18th market and--- by continuing backward
induction--- of every market, including the first. This result is called the
{\bf Chainstore Paradox} after Selten (1978) .
Backward induction ensures that the strategy profile is a subgame perfect
equilibrium. There are other Nash equilibria--- ({\it Always Fight, Never
Enter}), for example--- but because of the Chainstore Paradox they are not
perfect.
\bigskip
\noindent
{\bf The Repeated { Prisoner's Dilemma} }
\noindent
The { Prisoner's Dilemma} is similar to { Entry Deterrence I}. Here
the prisoners would like to commit themselves to {\it Silence}, but, in the
absence of commitment, they {\it Blame}. The Chainstore Paradox can be applied
to show that repetition does not induce cooperative behavior. Both prisoners
know that in the last repetition, both will {\it Blame}. After 18 repetitions,
they know that no matter what happens in the 19th, both will {\it Blame} in the
20th, so they might as well {\it Blame} in the 19th too. Building a reputation
is pointless, because in the 20th period it is not going to matter. Proceeding
inductively, both players {\it Blame} in every period, the unique perfect
equilibrium outcome.
In fact, as a consequence of the fact that the one-shot { Prisoner's Dilemma}
has a dominant-strategy equilibrium, blaming is the only Nash outcome for the
repeated { Prisoner's Dilemma}, not just the only perfect outcome. The
argument of the previous paragraph did not show that blaming was the unique Nash
outcome. To show subgame perfectness, we worked back from the end using longer
and longer subgames. To show that blaming is the only Nash outcome, we do not
look at subgames, but instead rule out successive classes of strategies from
being Nash. Consider the portions of the strategy which apply to the equilibrium
path (that is, the portions directly relevant to the payoffs). No strategy in
the class that calls for $Silence$ in the last period can be a Nash strategy,
because the same strategy with $Blame$ replacing $Silence$ would dominate it.
But if both players have strategies calling for blaming in the last period, then
no strategy that does not call for blaming in the next-to-last period is Nash,
because a player should deviate by replacing $Silence$ with $Blame$ in the next-
to-last period. The argument can be carried back to the first period, ruling
out any class of strategies that does not call for blaming everywhere along the
equilibrium path.
The strategy of always blaming is not a dominant strategy, as it is in the
one-shot game, because it is not the best response to various suboptimal
strategies such as ({\it Silence until the other player Blamees, then Silence
for the rest of the game}). Moreover, the uniqueness is only on the equilibrium
path. Nonperfect Nash strategies could call for cooperation at nodes far away
from the equilibrium path, since that action would never have to be taken. If
Row has chosen ({\it Always Blame}), one of Column's best responses is ({\it
Always Blame unless Row has chosen Silence ten times; then always Silence}).
\vspace{1in}
\noindent
{ \bf 5.2 Infinitely Repeated Games, Minimax Punishments, and the Folk Theorem}
\noindent The contradiction between the Chainstore Paradox and what many people
think of as real world behavior has been most successfully resolved by adding
incomplete information to the model, as will be seen in Section 6.4. Before we
turn to incomplete information, however, we will explore certain other
modifications. One idea is to repeat the { Prisoner's Dilemma} an infinite
number of times instead of a finite number (after all, few economies have a
known end date). Without a last period, the inductive argument in the
Chainstore Paradox fails.
In fact, we can find a simple perfect equilibrium for the infinitely repeated
{ Prisoner's Dilemma} in which both players cooperate---a game in which both
players adopt the Grim Strategy.
\noindent
{\bf Grim Strategy}\\
{\it 1 Start by choosing {\it Silence}.\\
2 Continue to choose {\it Silence} unless some player has chosen $Blame$, in
which case choose $Blame$ forever.}
Notice that the Grim Strategy says that even if a player is the first to
deviate and choose {\it Blame}, he continues to choose {\it Blame} thereafter.
If Column uses the Grim Strategy, the Grim Strategy is weakly Row's best
response. If Row cooperates, he will continue to receive the high ($Silence,
Silence)$ payoff forever. If he blamees, he will receive the higher $(Blame,
Silence)$ payoff once, but the best he can hope for thereafter is the $(Blame,
Blame)$ payoff.
Even in the infinitely repeated game, cooperation is not immediate, and not
every strategy that punishes blaming is perfect. A notable example is the
strategy of Tit-for-Tat.
\noindent
{\bf Tit-for-Tat}\\
{\it 1 Start by choosing {\it Silence}.\\
2 Thereafter, in period $n$ choose the action that the other player chose in
period $(n-1)$.}
If Column uses Tit-for-Tat, Row does not have an incentive to $Blame$ first,
because if Row cooperates he will continue to receive the high ($Silence,
Silence)$ payoff, but if he blamees and then returns to Tit-for-Tat, the players
alternate $(Blame, Silence)$ with $(Silence, Blame)$ forever. Row's average
payoff from this alternation would be lower than if he had stuck to ($Silence,
Silence)$, and would swamp the one-time gain. But Tit-for-Tat is almost never
perfect in the infinitely repeated { Prisoner's Dilemma} without discounting,
because it is not rational for Column to punish Row's initial $Blame$. Adhering
to Tit-for-Tat's punishments results in a miserable alternation of $Blame$ and
$Silence$, so Column would rather ignore Row's first $Blame$. The deviation is
not from the equilibrium path action of $Silence$, but from the off-equilibrium
action rule of {\it Blame in response to a Blame}. Thus, Tit-for-Tat, unlike
the Grim
Strategy, is not subgame perfect. (See Kalai, Samet \& Stanford (1988)
and Problem 5.5 for more on this point.)
Unfortunately, although eternal cooperation is a perfect equilibrium outcome
in the infinite game under at least one strategy, so is practically anything
else, including eternal blaming. The multiplicity of equilibria is summarized
by the Folk Theorem, so called because its origins are hazy.\footnote{There is
a multiplicity of Folk Theorems too, since the idea can be formalized in many
ways and in many settings--- infinitely or finitely repeated games, complete or
incomplete information, overlapping generations, epsilon-equilibria, and so
forth. Benoit \& Krishna (2000) attempt a synthesis, using the basic principle
that many things can happen if there is not an end-game that can pin down the
equilibrium.}
\noindent
{\bf Theorem 1 (the Folk Theorem)}\\
{\it In an infinitely repeated n-person game with finite action sets at each
repetition, any profile of actions observed in any finite number of repetitions
is the unique outcome of some subgame perfect equilibrium given \\ {\bf
Condition 1:} The rate of time preference is zero, or positive and sufficiently
small; \\ {\bf Condition 2:} The probability that the game ends at any
repetition is zero, or positive and sufficiently small; and\\
{\bf Condition 3:} The set of payoff profiles that strictly Pareto dominate the
minimax payoff profiles in the mixed extension of the one-shot game is n-
dimensional.}
What the Folk Theorem tells us is that claiming that particular behavior arises
in a perfect equilibrium is meaningless in an infinitely repeated game. This
applies to any game that meets conditions 1 to 3, not just to the {
Prisoner's Dilemma}. If an infinite amount of time always remains in the game, a
way can always be found to make one player willing to punish some other player
for the sake of a better future, even if the punishment currently hurts the
punisher as well as the punished. Any finite interval of time is insignificant
compared to eternity, so the threat of future reprisal makes the players willing
to carry out the punishments needed.
\noindent
We will next discuss conditions 1 to 3.
\noindent
{\bf Condition 1: Discounting}
\noindent
The Folk Theorem helps answer the question of whether discounting future
payments lessens the influence of the troublesome Last Period. Quite to the
contrary, with discounting, the present gain from blaming is weighted more
heavily and future gains from cooperation more lightly. If the discount rate is
very high the game almost returns to being one-shot. When the real interest rate
is 1,000 percent, a payment next year is little better than a payment a hundred
years hence, so next year is practically irrelevant. Any model that relies on a
large number of repetitions also assumes that the discount rate is not too high.
Allowing a little discounting is none the less important to show there is no
discontinuity at the discount rate of zero. If we come across an undiscounted,
infinitely repeated game with many equilibria, the Folk Theorem tells us that
adding a low discount rate will not reduce the number of equilibria. This
contrasts with the effect of changing the model by having a large but finite
number of repetitions, a change which often eliminates all but one outcome by
inducing the Chainstore Paradox.
A discount rate of zero supports many perfect equilibria, but if the rate is
high enough, the only equilibrium outcome is eternal blaming. We can calculate
the critical value for given parameters. The Grim Strategy imposes the heaviest
possible punishment for deviant behavior. Using the payoffs for the {
Prisoner's Dilemma} from Table 2a in the next section, the equilibrium payoff
from the Grim Strategy is the current payoff of $5$ plus the value of the rest
of the game, which from Table 2 of Chapter 4 is $\frac{5}{r}$. If Row deviated
by blaming, he would receive a current payoff of 10, but the value of the rest
of the game would fall to $0$. The critical value of the discount rate is found
by solving the equation $5 + \frac{5}{r} = 10 + 0$, which yields $r = 1$, a
discount rate of 100 percent or a discount factor of $\delta = 0.5$. Unless the
players are extremely impatient, blaming is not much of a temptation.
\noindent
{\bf Condition 2: A probability of the game ending}
\noindent Time preference is fairly straightforward, but what is surprising is
that assuming that the game ends in each period with probability $\theta$ does
not make a drastic difference. In fact, we could even allow $\theta$ to vary
over time, so long as it never became too large. If $\theta>0$, the game ends
in finite time with probability one; or, put less dramatically, the expected
number of repetitions is finite, but it still behaves like a discounted infinite
game, because the expected number of future repetitions is always large, no
matter how many have already occurred. The game still has no Last Period, and
it is still true that imposing one, no matter how far beyond the expected number
of repetitions, would radically change the results.
The following two situations are different from each other. \\ ``1 The game
will end at some uncertain date before $T$.''\\ ``2 There is a constant
probability of the game ending.'' \\
In situation (1), the game is like a finite game, because, as time passes,
the maximum length of time still to run shrinks to zero. In situation (2), even
if the game will end by $T$ with high probability, if it actually lasts until
$T$ the game looks exactly the same as at time zero. The fourth verse from the
hymn ``Amazing grace'' puts this stationarity very nicely (though I expect it
is supposed to apply to a game with $\theta =0$).
\begin{quotation} \noindent {\it When we've been there ten thousand years,\\
Bright shining as the sun,\\ We've no less days to sing God's praise \\
Than when we'd first begun.} \end{quotation}
\noindent
{\bf Condition 3: Dimensionality }
\noindent
The ``minimax payoff'' mentioned in Theorem 1 is the payoff that results if
all the other players pick strategies solely to punish player $i$, and he
protects himself as best he can.
{\it The set of strategies $s_{-i}^{i*}$ is a set of $(n-1)$ {\bf minimax
strategies} chosen by all the players except $i$ to keep $i$'s payoff as low as
possible, no matter how he responds. $s_{-i}^{i*}$ solves} \begin{equation}
\label{e5.1} \stackrel{Minimize}{s_{-i}}\;\; \stackrel{Maximum}{s_{i}}
\pi_i(s_i, s_{- i}). \end{equation} {\it Player $i$'s {\bf minimax payoff},
{\bf minimax value}, or {\bf security value} is his payoff from the solution of
(\ref{e5.1}). }
The dimensionality condition is needed only for games with three or more
players. It is satisfied if there is some payoff profile for each player in
which his payoff is greater than his minimax payoff but still different from the
payoff of every other player. Figure 1 shows how this condition is satisfied
for the two-person { Prisoner's Dilemma} of Table 2a a few pages beyond
this paragraph, but not for the two-person Ranked Coordination game. It is also
satisfied by the $n$-person { Prisoner's Dilemma} in which a solitary blameer
gets a higher payoff than his cooperating fellow-prisoners, but not by the $n$-
person Ranked Coordination game, in which all the players have the same payoff.
The condition is necessary because establishing the desired behavior requires
some way for the other players to punish a deviator without punishing
themselves.
\includegraphics[width=150mm]{fig05-01.jpg}
\begin{center}
{\bf Figure 1: The Dimensionality Condition}
\end{center}
An alternative to the dimensionality condition in the Folk Theorem is
\noindent
{\bf Condition 3$'$:} {\it The repeated game has a ``desirable'' subgame-perfect
equilibrium in which the strategy profile $\overline{s}$ played each period
gives player $i$ a payoff that exceeds his payoff from some other ``punishment''
subgame-perfect equilibrium in which the strategy profile $\underline{s}^i$ is
played each period:} $$ \exists \overline{s} : \forall i, \; \exists
\underline{s}^i : \pi_i(\underline{s}^i) <
\pi_i(\overline{s}).\;\;\;\;\;\;\;\;\;\;\; $$
Condition $3'$ is useful because sometimes it is easy to find a few perfect
equilibria. To enforce the desired pattern of behavior, use the ``desirable''
equilibrium as a carrot and the ``punishment'' equilibrium as a self-enforcing
stick (see Rasmusen [1992a]).
\bigskip
\noindent
{\bf Minimax and Maximin}
In discussions of strategies which enforce cooperation, the question of
the maximum severity of punishment strategies frequently arises.
Thus, the idea of the minimax strategy---the most severe sanction possible if
the offender does not cooperate
in his own punishment--- entered into the statement of the Folk Theorem. The
corresponding strategy for an offender trying to
protect himself from punishment, is the maximin strategy.
{\it The strategy $s_i^*$ is a {\bf maximin strategy} for player $i$ if, given
that the other players pick strategies to make i's payoff as low as possible,
$s_i^*$ gives i the highest possible payoff. In our notation, $s_i^*$ solves}
\begin{equation} \label{e5.2}
\stackrel{Maximize}{s_i}\;\; \stackrel{Minimum}{s_{-i}} \pi_i(s_i,s_{-i}).
\end{equation}
The following formulas show how to calculate the minimax and maximin
strategies for a two-player game with Player 1 as $i$.
\begin{center}
\begin{tabular}{cccc}
Maximin:& $Maximum$& $Minimum$ & $\pi_1$ \\ & $s_1$ &$s_2$& \\
& & & \\
Minimax:&
$Minimum$& $Maximum$ & $\pi_1$ \\ & $s_2$ &$s_1$& \\
\end{tabular}
\end{center}
In the { Prisoner's Dilemma}, the minimax and maximin strategies are both
{\it Blame}. Although the Welfare Game (Chapter 3's Table 1) has only a
mixed strategy Nash equilibrium, if we restrict ourselves to the pure strategies
(just for illustration here) the Pauper's maximin strategy is {\it Try to
Work}, which guarantees him at least 1, and his strategy for minimaxing the
Government is {\it Be Idle}, which prevents the Government from getting more
than zero.
Under minimax, Player 2 is purely malicious but must move first (at least in
choosing a mixing probability) in his attempt to cause player 1 the maximum
pain. Under maximin, Player 1 moves first, in the belief that Player 2 is out to
get him. In variable-sum games, minimax is for sadists and maximin for
paranoids. In zero-sum games, the players are merely neurotic. Minimax is for
optimists, and maximin is for pessimists.
The maximin strategy need not be unique, and it can be in mixed strategies.
Since maximin behavior can also be viewed as minimizing the maximum loss that
might be suffered, decision theorists refer to such a policy as a {\bf minimax
criterion,} a catchier phrase (Luce \& Raiffa [1957], p. 279).
It is tempting to use maximin strategies as the basis of an equilibrium
concept. A {\bf maximin equilibrium} is made up of a maximin strategy for each
player. Such a strategy might seem reasonable because each player then has
protected himself from the worst harm possible. Maximin strategies have very
little justification, however, for a rational player. They are not simply the
optimal strategies for risk-averse players, because risk aversion is accounted
for in the utility payoffs. The players' implicit beliefs can be inconsistent in
a maximin equilibrium, and a player must believe that his opponent would choose
the most harmful strategy out of spite rather than self-interest if maximin
behavior is to be rational.
The usefulness of minimax and maximin strategies is not in directly predicting
the best strategies of the players, but in setting the bounds of how their
strategies affect their payoffs, as in condition 3 of Theorem 1.
It is important to remember that minimax and maximin strategies are not always
pure strategies. In the Minimax Illustration Game of Table 1, which I take from
Fudenerg \& Tirole (1991a, p. 150), Row can guarantee himself a payoff of 0 by
choosing $Down$, so that is his maximin strategy. Column cannot hold Row's
payoff down to 0, however, by using a pure minimax strategy. If Column chooses
$Left$, Row can choose $Middle$ and get a payoff of 1; if Column chooses
$Right$, Row can choose $Up$ and get a payoff of 1. Column can, however, hold
Row's payoff down to 0 by choosing a mixed minimax strategy of {\it (Probability
0.5 of Left, Probability 0.5 of Right)}. Row would then respond with $Down$,
for a minimax payoff of 0, since either $Up$, $Middle$, or a mixture of the two
would give him a payoff of $-0.5$ ($=0.5 (-2) + 0.5 (1)) $.
\begin{center}
{\bf Table 1:
The Minimax Illustration Game}
\begin{tabular}{lllccc} & & &\multicolumn{3}{c}{\bf Column}\\
& & & $Left$ & & $Right$ \\ & & $ Up $ &
$-2, \fbox{2} $ & & $\fbox{1},-2$ \\ & {\bf Row:} & $Middle$& $ \fbox{1}, -2$
& & $-2, \fbox{2}$ \\ & & $Down$ & $ 0, \fbox{1} $ & &
$0,\fbox{1}$ \\
\end{tabular}
\end{center}
\vspace{-12pt}
{\it Payoffs to: (Row, Column). Best-response payoffs are boxed. }
Column's
maximin and minimax strategies can also be computed. The strategy for minimaxing
Column is {\it (Probability 0.5 of Up, Probability 0.5 of Middle)}, his maximin
strategy is {\it (Probability 0.5 of Left, Probability 0.5 of Right)}, and his
minimax payoff is 0.
In two-person zero-sum games, minimax and maximin strategies are more
directly useful, because when Player 1 reduces Player 2's payoff, he increases
his own payoff. Punishing the other player is equivalent to rewarding yourself.
This is the origin of the celebrated {\bf Minimax Theorem} (von Neumann [1928]),
which says that a minimax equilibrium exists in pure or mixed strategies for
every two-person zero-sum game and is identical to the maximin equilibrium.
Unfortunately, the games that come up in applications are usually not
zero-sum games, so the Minimax Theorem usually cannot be applied.
\bigskip
\noindent
{\bf Precommitment}
\noindent What if we use metastrategies, abandoning the idea of perfectness by
allowing players to commit at the start to a strategy for the rest of the game?
We would still want to keep the game noncooperative by disallowing binding
promises, but we could model it as a game with simultaneous choices by both
players, or with one move each in sequence.
If precommitted strategies are chosen simultaneously, the equilibrium outcome
of the finitely repeated { Prisoner's Dilemma} calls for always blaming,
because allowing commitment is the same as allowing equilibria to be nonperfect,
in which case, as was shown earlier, the unique Nash outcome is always blaming.
A different result is achieved if the players precommit to strategies in
sequence. The outcome depends on the particular values of the parameters, but
one possible equilibrium is the following: Row moves first and chooses the
strategy ({\it Silence} until Column $Blame$s; thereafter always $Blame$), and
Column chooses ({\it Silence} until the last period; then $Blame$). The
observed outcome would be for both players to choose {\it Silence} until the
last period, and
then for Row to again choose {\it Silence} , but for Column to choose {\it
Blame} . Row would submit to
this because if he chose a strategy that initiated blaming earlier, Column would
choose a strategy of starting to blame earlier too. The game has a second-mover
advantage.
\bigskip
\noindent
{\bf 5.3 Reputation: The One-Sided { Prisoner's Dilemma} }
\noindent
Part II of this book will analyze moral hazard and adverse selection. Under
moral hazard, a player wants to commit to high effort, but he cannot credibly do
so. Under adverse selection, a player wants to prove he is high ability, but he
cannot. In both, the problem is that the penalties for lying are insufficient.
Reputation seems to offer a way out of the problem. If the relationship is
repeated, perhaps a player is willing to be honest in early periods in order to
establish a reputation for honesty which will be valuable to himself later.
Reputation seems to play a similar role in making threats to punish credible.
Usually punishment is costly to the punisher as well as the punished, and it is
not clear why the punisher should not let bygones be bygones. Yet in 1988 the
Soviet Union paid off 70-year-old debt to dissuade the Swiss authorities from
blocking a mutually beneficial new bond issue (``Soviets Agree to Pay Off
Czarist Debt to Switzerland,'' {\it Wall Street Journal}, January 19, 1988, p.
60). Why were the Swiss so vindictive towards Lenin?
The questions of why players do punish and do not cheat are really the same
questions that arise in the repeated { Prisoner's Dilemma}, where the fact of
an infinite number of repetitions allows cooperation. That is the great problem
of reputation. Since everyone knows that a player will {\it Blame}, choose low
effort, or default on debt in the last period, why do they suppose he will
bother to build up a reputation in the present? Why should past behavior be any
guide to future behavior?
Not all reputation problems are quite the same as the { Prisoner's
Dilemma}, but they have much the same flavor. Some games, like duopoly or the
original { Prisoner's Dilemma}, are {\bf two-sided} in the sense that each
player has the same strategy set and the payoffs are symmetric. Others, such as
the game of Product Quality (see below), are what we might call {\bf one-sided
Prisoner's Dilemmas}, which have properties similar to the Prisoner's Dilemma,
but do not fit the usual definition because they are asymmetric. Table 2 shows
the normal forms for both the original { Prisoner's Dilemma} and the one-
sided version.\footnote{The exact numbers are different from the { Prisoner's
Dilemma} in Table 1 in Chapter 1, but the ordinal rankings are the same.
Numbers such as those in Table 2 of the present chapter are more commonly used,
because it is convenient to normalize the ({\it Blame, Blame}) payoffs to (0,0)
and to make most of the numbers positive rather than negative.} The important
difference is that in the one-sided { Prisoner's Dilemma} at least one
player really does prefer the outcome equivalent to $(Silence, Silence)$, which
is ({\it High Quality, Buy)} in Table 2b, to anything else. He blames
defensively, rather than both offensively and defensively. The payoff (0,0) can
often be interpreted as the refusal of one player to interact with the other,
for example, the motorist who refuses to buy cars from Chrysler because he knows
they once falsified odometers. Table 3 lists examples of both one-sided and
two-sided games.
\begin{center} {\bf Table 2: Prisoner's Dilemmas }
\end{center}
\noindent (a) Two-Sided (conventional)
\begin{tabular}{lllccc} & & &\multicolumn{3}{c}{\bf Column}\\
& & & {\it Silence} & & {\it Blame } \\ & & {\it
Silence } & 5,5 & $\rightarrow$ & -5,10 \\ & {\bf Row:}
&&$\downarrow$& & $\downarrow$ \\ & & {\it Blame } & 10,-5
& $\rightarrow$ & {\bf 0,0} \\
\end{tabular}
{\it Payoffs to: (Row, Column). Arrows show how a player can increase his
payoff. }
\bigskip
\noindent (b) One-Sided
\begin{tabular}{lllccc} & & &\multicolumn{3}{c}{\bf Consumer
(Column)}\\ & & & {\it Buy} & & {\it Boycott } \\ &
& {\it High Quality } & 5,5 & $\leftarrow$ & 0,0 \\ & {\bf Seller
(Row):} &&$\downarrow$& & $\updownarrow$ \\ & & {\it Low Quality }
& 10, -5 & $\rightarrow$ & {\bf 0,0} \\
\end{tabular}
{\it Payoffs to: (Seller, Consumer). Arrows show how a player can increase
his payoff. }
\bigskip
\begin{small} \begin{center} {\bf Table 3 Some Repeated Games in which
Reputation Is Important }
\begin{tabular}{ llll }
\hline {\bf Application} & {\bf Sidedness} & {\bf Players} & {\bf Actions} \\
& & & \\ \hline & & &\\ { Prisoner's Dilemma} & two-sided & Row & {\it
Silence/Blame} \\ & & Column & {\it Silence/Blame}\\ & & &\\ Duopoly& two-sided
& Firm & {\it High price/Low price} \\ & & Firm & {\it High price/Low price}
\\ & & &\\ Employment& two-sided & Employer & {\it Bonus/No bonus} \\ & &
Employee & {\it Work/Shirk} \\ & & &\\ Product Quality & one-sided & Consumer &
{\it Buy/Boycott} \\ & & Seller & {\it High quality/low quality} \\ & & &\\
Entry Deterrence& one-sided & Incumbent & {\it Low price/High price} \\ & &
Entrant & {\it Enter/Stay out} \\ & & &\\ Financial Disclosure & one-sided &
Corporation & {\it Truth/Lies} \\ & & Investor & {\it Invest/Refrain} \\ &
& &\\ Borrowing & one-sided & Lender & {\it Lend/Refuse } \\ & & Borrower &
{\it Repay/Default} \\ & & &\\ \hline \end{tabular} \end{center} \end{small}
The Nash and iterated dominance equilibria in the one-sided { Prisoner's
Dilemma} are still ({\it Blame, Blame}), but it is not a dominant-strategy
equilibrium. Column does not have a dominant strategy, because if Row were to
choose $Silence$, Column would also choose $Silence,$ to obtain the payoff of 5;
but if Row chooses $Blame$, Column would choose $Blame$, for a payoff of zero.
$Blame$ is however, weakly dominant for Row, which makes $(Blame,Blame)$ the
iterated dominant strategy equilibrium. In both games, the players would like
to persuade each other that they will cooperate, and devices that induce
cooperation in the one-sided game will usually obtain the same result in the
two-sided game.
\bigskip \noindent
{\bf 5.4 Product Quality in an Infinitely Repeated Game}
\noindent
The Folk Theorem tells us that some perfect equilibrium of an infinitely
repeated game--- sometimes called an {\bf infinite horizon model}--- can
generate any pattern of behavior observed over a finite number of periods. But
since the Folk Theorem is no more than a mathematical result, the strategies
that generate particular patterns of behavior may be unreasonable. The
theorem's value is in provoking close scrutiny of infinite horizon models so
that the modeller must show why his equilibrium is better than a host of others.
He must go beyond satisfaction of the technical criterion of perfectness and
justify the strategies on other grounds.
In the simplest model of product quality, a seller can choose between producing
costly high quality or costless low quality, and the buyer cannot determine
quality before he purchases. If the seller would produce high quality under
symmetric information, we have a one-sided { Prisoner's Dilemma}, as in Table
2b. Both players are better off when the seller produces high quality and the
buyer purchases the product, but the seller's weakly dominant strategy is to
produce low quality, so the buyer will not purchase. This is also an example of
moral hazard, the topic of chapter 7.
A potential solution is to repeat the game, allowing the firm to choose
quality at each repetition. If the number of repetitions is finite, however, the
outcome stays the same because of the Chainstore Paradox. In the last
repetition, the subgame is identical to the one- shot game, so the firm chooses
low quality. In the next-to-last repetition, it is foreseen that the last
period's outcome is independent of current actions, so the firm also chooses low
quality, an argument that can be carried back to the first repetition.
If the game is repeated an infinite number of times, the Chainstore Paradox is
inapplicable and the Folk Theorem says that a wide range of outcomes can be
observed in equilibrium. Klein \& Leffler (1981) construct a plausible
equilibrium for an infinite period model. Their original article, in the
traditional verbal style of UCLA, does not phrase the result in terms of game
theory, but we will recast it here, as I did in Rasmusen (1989b). In
equilibrium, the firm is willing to produce a high quality product because it
can sell at a high price for many periods, but consumers refuse to ever buy
again from a firm that has once produced low quality. The equilibrium price is
high enough that the firm is unwilling to sacrifice its future profits for a
one- time windfall from deceitfully producing low quality and selling it at a
high price. Although this is only one of a large number of subgame perfect
equilibria, the consumers' behavior is simple and rational: no consumer can
benefit by deviating from the equilibrium.
\begin{center}
{\bf Product Quality}\\
\end{center}
{\bf Players}\\
An infinite number of potential firms and a continuum of consumers.
\noindent
{\bf The Order of Play}\\
1 An endogenous number $n$ of firms decide to enter the market at cost $F$.\\
2 A firm that has entered chooses its quality to be $High$ or $Low$, incurring
the constant marginal cost $c$ if it picks $High$ and zero if it picks $Low$.
The choice is unobserved by consumers. The firm also picks a price $p$.\\ 3
Consumers decide which firms (if any) to buy from, choosing firms randomly if
they are indifferent. The amount bought from firm $i$ is denoted $q_i$. \\ 4
All consumers observe the quality of all goods purchased in that period.\\ 5
The game returns to (2) and repeats.
\noindent {\bf Payoffs}\\
The consumer benefit from a product of low quality is zero, but consumers are
willing to buy quantity $q(p) = \sum_{i=1}^n q_i$ for a product believed to be
high quality, where $\frac{dq}{dp} < 0$.\\ If a firm stays out of the market,
its payoff is zero.\\ If firm $i$ enters, it receives $-F$ immediately. Its
current end-of- period payoff is $q_ip$ if it produces $Low$ quality and $q_i(p-
c)$ if it produces $High$ quality. The discount rate is $r \geq 0$.
That the firm can produce low quality items at zero marginal cost is
unrealistic, but it is only a simplifying assumption. By normalizing the cost of
producing low quality to zero, we avoid having to carry an extra variable
through the analysis without affecting the result.
The Folk Theorem tells us that this game has a wide range of perfect
outcomes, including a large number with erratic quality patterns like ({\it
High, High, Low, High, Low, Low}$\ldots$). If we confine ourselves to pure-
strategy equilibria with the stationary outcome of constant quality and
identical behavior by all firms in the market, then the two outcomes are low
quality and high quality. Low quality is always an equilibrium outcome, since it
is an equilibrium of the one-shot game. If the discount rate is low enough,
high quality is also an equilibrium outcome, and this will be the focus of our
attention. Consider the following strategy profile:
\noindent {\bf Firms.} $\tilde{n}$ firms enter. Each produces high quality and
sells at price $\tilde{p}$. If a firm ever deviates from this, it thereafter
produces low quality (and sells at the same price $\tilde{p} $). The values of
$\tilde{p}$ and $\tilde{n}$ are given by equations (\ref{e5.4}) and (\ref{e5.8})
below.
\noindent {\bf Buyers.} Buyers start by choosing randomly among the firms
charging $\tilde{p}$. Thereafter, they remain with their initial firm unless it
changes its price or quality, in which case they switch randomly to a firm that
has not changed its price or quality.
\noindent This strategy profile is a perfect equilibrium. Each firm is willing
to produce high quality and refrain from price-cutting because otherwise it
would lose all its customers. If it has deviated, it is willing to produce low
quality because the quality is unimportant, given the absence of customers.
Buyers stay away from a firm that has produced low quality because they know it
will continue to do so, and they stay away from a firm that has cut the price
because they know it will produce low quality. For this story to work,
however, the equilibrium must satisfy three constraints that will be explained
in more depth in Section 7.3: incentive compatibility, competition, and market
clearing.
The {\bf incentive compatibility} constraint says that the individual firm must
be willing to produce high quality. Given the buyers' strategy, if the firm
ever produces low quality it receives a one-time windfall profit, but loses its
future profits. The tradeoff is represented by constraint (\ref{e5.3}), which
is satisfied if the discount rate is low enough. \begin{equation}\label{e5.3}
\frac{q_i p}{1+r} \leq \frac{q_i(p-c)}{r} \;\;\;\;\;\;\;\;\; (incentive \;
compatibility). \end{equation} Inequality (\ref{e5.3}) determines a lower bound
for the price, which must satisfy \begin{equation}\label{e5.4} \tilde{p} \geq
(1+r)c. \end{equation} Condition (\ref{e5.4}) will be satisfied as an
equality, because any firm trying to charge a price higher than the quality-
guaranteeing $\tilde{p}$ would lose all its customers.
The second constraint is that competition drives profits to zero, so firms
are indifferent between entering and staying out of the market.
\begin{equation}
\label{e5.5} \frac{q_i(p-c)}{r} = F \;\;\;\;\;\;\;\;\; (competition)
\end{equation}
Treating (\ref{e5.3}) as an equation and using it to replace $p$
in equation (\ref{e5.5}) gives \begin{equation}\label{e5.6} q_i = \frac{F }{c}.
\end{equation} We have now determined $p$ and $q_i$, and only $n$ remains, which
is determined by the equality of supply and demand. The market does not always
clear in models of asymmetric information (see Stiglitz [1987]), and in this
model each firm would like to sell more than its equilibrium output at the
equilibrium price, but the market output must equal the quantity demanded by the
market. \begin{equation}\label{e5.7} nq_i = q(p). \;\;\;\;\;\;\;\;\; (market
\;\; clearing) \end{equation} Combining equations (\ref{e5.3}), (\ref{e5.6}),
and (\ref{e5.7}) yields \begin{equation}\label{e5.8}
\tilde{ n} = \frac{cq([1+r]c)}{F }. \end{equation} We have now determined the
equilibrium values, the only difficulty being the standard existence problem
caused by the requirement that the number of firms be an integer (see note
N5.4).
The equilibrium price is fixed because $F$ is exogenous and demand is not
perfectly inelastic, which pins down the size of firms. If there were no entry
cost, but demand were still elastic, then the equilibrium price would still be
the unique $p$ that satisfied constraint (\ref{e5.3}), and the market quantity
would be determined by $q(p)$, but $F$ and $q_i$ would be undetermined. If
consumers believed that any firm which might possibly produce high quality paid
an exogenous dissipation cost $F$, the result would be a continuum of
equilibria. The firms' best response would be for $\tilde{n}$ of them to pay
$F$ and produce high quality at price $\tilde{p}$, where $\tilde{n}$ is
determined by the zero profit condition as a function of $F$. Klein \& Leffler
note this indeterminacy and suggest that the profits might be dissipated by some
sort of brand-specific capital. This is especially plausible when there is
asymmetric information, so firms might wish to use capital spending to signal
that they intend to be in the business for a long time; Rasmusen \& Perri (2001)
shows a way to model this. Another good explanation for which firms enjoy the
high profits of good reputation is simply the history of the industry.
Schmalensee (1982) shows how a pioneering brand can retain a large market share
because consumers are unwilling to investigate the quality of new brands.
The repeated-game model of reputation for product quality can be used to
model many other kinds of reputation too. Even before Klein \& Leffler (1981),
Telser
titled his 1980 article ``A Theory of Self- Enforcing Agreements,'' and
looked
at a number of situations in which repeated play balanced the short-run
gain from cheating against the long-run gain from cooperation. We will see
the
idea later in this book in in Section 8.1 as part of the idea of the
``efficiency wage''.
Keep in mind, however, that
``reputation'' can be modelled in two distinct ways. In our next model, a
firm with a good reputation is one which produces high quality to avoid losing
that reputation, a ``moral hazard'' model because the focus is on the player's
choice of actions. An alternative is a model in which the firm with a good
reputation is one which has shown that it would not produce low quality even if
there were no adverse consequences from doing so, an ``adverse selection'' model
because the focus is on the player's type. One kind of reputation is for
deciding to be good; the other is for Nature having chosen the player to be
good. As you will see, the Gang of Four model of Chapter 6 will mix the two.
\bigskip \noindent
{\bf *5.5 Markov Equilibria and Overlapping Generations:
Customer Switching Costs}
\noindent The next model demonstrates a general modelling technique, the {\bf
overlapping generations model}, in which different cohorts of otherwise
identical players enter and leave the game with overlapping ``lifetimes,'' and
a new equilibrium concept, ``Markov equilibrium.'' The best-known example of
an overlapping-generations model is the original consumption-loans model of
Samuelson (1958). The models are most often used in macroeconomics, but they
can also be useful in microeconomics. Klemperer (1987) has stimulated
considerable interest in customers who incur costs in moving from one seller to
another. The model used here will be that of Farrell \& Shapiro (1988).
\begin{center}
{\bf Customer Switching Costs}
\end{center}
{\bf Players}\\
Firms Apex and Brydox, and a series of customers, each of whom is first called
a youngster and then an oldster.
\noindent {\bf The Order of Play }\\
1a Brydox, the initial incumbent, picks the incumbent price $p_{1} ^i$.\\ 1b
Apex, the initial entrant, picks the entrant price $p_{1}^e$.\\ 1c The oldster
picks a firm.\\
1d The youngster picks a firm.\\ 1e Whichever firm attracted the youngster
becomes the incumbent.\\ 1f The oldster dies and the youngster becomes an
oldster.\\
2a Return to (1a), possibly with new identities for entrant and incumbent.
\noindent {\bf Payoffs}\\ The discount factor is $\delta$. The customer
reservation price is $R$ and the switching cost is $c$. The per period payoffs
in period $t$ are, for $j= (i,e)$,
\begin{tabular}{ll}
$ \pi_{firm \;j} =$ & $\left\{ \begin{tabular}{ll} 0 & if no customers are
attracted.\\
$p_t^j$ & if just oldsters or just youngsters are attracted. \\ $2p_t^j$ & if
both oldsters and youngsters are attracted. \\ \end{tabular} \right.$
\end{tabular}
\begin{tabular}{ll} $ \pi_{oldster} =$& $\left\{ \begin{tabular}{ll} $R -
p_t^i$ & if he buys from the incumbent.\\ $R - p_t^e - c$ & if he switches to
the entrant. \\ \end{tabular} \right.$ \end{tabular}
\begin{tabular}{ll} $ \pi_{youngster} =$& $\left\{ \begin{tabular}{ll} $R -
p_t^i $ & if he buys from the incumbent.\\ $R -p_{t}^e$ & if he buys from the
entrant.\\ \end{tabular} \right.$ \end{tabular}
\bigskip
Finding all the perfect equilibria of an infinite game like this one is
difficult, so we will follow Farrell and Shapiro in limiting ourselves to the
much easier task of finding the perfect Markov equilibrium, which is unique.
{\it A {\bf Markov strategy} is a strategy that, at each node, chooses the
action independently of the history of the game except for the immediately
preceding action (or actions, if they were simultaneous). }
Here, a firm's Markov strategy is its price as a function of whether the
particular is the incumbent or the entrant, and not a function of the entire
past history of the game.
There are two ways to use Markov strategies: (1) just look for equilibria that
use Markov strategies, and (2) disallow nonMarkov strategies and then look for
equilibria. Because the first way does not disallow non-Markov strategies, the
equilibrium must be such that no player wants to deviate by using any other
strategy, whether Markov or not. This is just a way of eliminating possible
multiple equilibria by discarding ones that use non-Markov strategies. The
second way is much more dubious, because it requires the players not to use non-
Markov strategies, even if they are best responses. A {\bf perfect Markov
equilibrium} uses the first approach: it is a perfect equilibrium that happens
to use only Markov strategies.
Brydox, the initial incumbent, moves first and chooses $p^i$ low enough that
Apex is not tempted to choose $p^e < p^i-c$ and steal away the oldsters. Apex's
profit is $p^i$ if it chooses $p^e = p^i$ and serves just youngsters, and
$2(p^i-c)$ if it chooses $p^e = p^i-c$ and serves both oldsters and youngsters.
Brydox chooses $p^i$ to make Apex indifferent between these alternatives, so
\begin{equation} \label{e5.9}
p^i=2(p^i-c), \end{equation} and \begin{equation} \label{e5.10} p^i =p^e= 2c.
\end{equation} In equilibrium, Apex and Brydox take turns being the incumbent
and charge the same price.
Because the game lasts forever and the equilibrium strategies are Markov, we
can use a trick from dynamic programming to calculate the payoffs from being the
entrant versus being the incumbent. The equilibrium payoff of the current
entrant is the immediate payment of $p^e$ plus the discounted value of being the
incumbent in the next period:
\begin{equation} \label{e5.11}
\pi_e^* = p^e + \delta \pi_i^*. \end{equation} The incumbent's payoff can be
similarly stated as the immediate payment of $p^i$ plus the discounted value of
being the entrant next period: \begin{equation} \label{e5.12} \pi_i^* = p^i +
\delta \pi_e^*.
\end{equation}
We could use equation (\ref{e5.10}) to substitute for $p^e$ and $p^i$, which
would leave us with the two equations (\ref{e5.11}) and (\ref{e5.12}) for the
two unknowns $\pi_i^*$ and $\pi_e^*$, but an easier way to compute the payoff is
to realize that in equilibrium the incumbent and the entrant sell the same
amount at the same price, so $\pi_i^*= \pi_e^*$ and equation (\ref{e5.12})
becomes \begin{equation} \label{e5.13}
\pi_i^* = 2c + \delta \pi_i^*. \end{equation} It follows that
\begin{equation} \label{e5.14} \pi_i^* = \pi_e^* = \frac{2c}{1 - \delta}.
\end{equation} Prices and total payoffs are increasing in the switching cost
$c$, because that is what gives the incumbent market power and prevents ordinary
competition of the ordinary Bertrand kind to be analyzed in section 13.2. The
total payoffs are increasing in $\delta$ for the usual reason that future
payments increase in value as $\delta$ approaches one.
\bigskip \noindent {\bf *5.6 Evolutionary Equilibrium: { Hawk-Dove} }
\noindent For most of this book we have been using the Nash equilibrium concept
or refinements of it based on information and sequentiality, but in biology such
concepts are often inappropriate. The lower animals are less likely than humans
to think about the strategies of their opponents at each stage of a game. Their
strategies are more likely to be preprogrammed and their strategy sets more
restricted than the businessman's, if perhaps not more so than his customer's.
In addition, behavior evolves, and any equilibrium must take account of the
possibility of odd behavior caused by the occasional mutation. That the
equilibrium is common knowledge, or that players cannot precommit to strategies,
are not compelling assumptions. Thus, the ideas of Nash equilibrium and
sequential rationality are much less useful than when game theory is modelling
rational players.
Game theory has grown to some importance in biology, but the style is
different than in economics. The goal is not to explain how players would
rationally pick actions in a given situation, but to explain how behavior
evolves or persists over time under exogenous shocks. Both approaches end up
defining equilibria to be strategy profiles that are best responses in some
sense, but biologists care much more about the stability of the equilibrium and
how strategies interact over time. In section 3.5, we touched briefly on the
stability of the Cournot equilibrium, but economists view stability as a
pleasing by-product of the equilibrium rather than its justification. For
biologists, stability is the point of the analysis.
Consider a game with identical players who engage in pairwise contests. In
this special context, it is useful to think of an equilibrium as a strategy
profile such that no player with a new strategy can enter the environment ({\bf
invade}) and receive a higher expected payoff than the old players. Moreover,
the invading strategy should continue to do well even if it plays itself with
finite probability, or its invasion could never grow to significance. In the
commonest model in biology, all the players adopt the same strategy in
equilibrium, called an evolutionarily stable strategy. John Maynard Smith
originated this idea, which is somewhat confusing because it really aims at an
equilibrium concept, which involves a strategy profile, not just one player's
strategy. For games with pairwise interactions and identical players, however,
the evolutionarily stable strategy can be used to define an equilibrium concept.
\noindent
{\it A strategy $s^*$ is an {\bf evolutionarily stable strategy}, or {\bf ESS},
if, using the notation $\pi(s_i,s_{-i})$ for player $i$'s payoff when his
opponent uses strategy $s_{-i}$, for every other strategy $s'$ either}
\begin{equation}\label{e4.5} \pi( s^*,s^*) > \pi( s',s^*) \end{equation} {\it
or} \begin{equation}\label{e4.6}
\begin{array}{l} (a) \;\; \pi( s^*,s^*) = \pi( s',s^*)\\ {\rm and}\\ (b) \;\;
\pi( s^*,s') > \pi( s',s'). \\ \end{array} \end{equation}
\noindent If condition (\ref{e4.5}) holds, then a population of players using
$s^*$ cannot be invaded by a deviant using $s'$. If condition (\ref{e4.6})
holds, then $s'$ does well against $s^*$, but badly against itself, so that if
more than one player tried to use $s'$ to invade a population using $s^*$, the
invaders would fail.
We can interpret ESS in terms of Nash equilibrium. Condition (\ref{e4.5})
says that $s^*$ is a strong Nash equilibrium (although not every strong Nash
strategy is an ESS). Condition (\ref{e4.6}) says that if $s^*$ is only a weak
Nash strategy, the weak alternative $s'$ is not a best response to itself. ESS
is a refinement of Nash, narrowed by the requirement that ESS not only be a best
response, but that (a) it have the highest payoff of any strategy used in
equilibrium (which rules out equilibria with asymmetric payoffs), and (b) it
be a strictly best response to itself.
The motivations behind the two equilibrium concepts are quite different, but
the similarities are useful because even if the modeller prefers ESS to Nash, he
can start with the Nash strategies in his efforts to find an ESS.
As an example of (a), consider the Battle of the Sexes. In it, the mixed
strategy equilibrium is an ESS, because a player using it has as high a payoff
as any other player. The two pure strategy equilibria are not made up of
ESS's, though, because in each of them one player's payoff is higher than the
other's. Compare with Ranked Coordination, in which the two pure strategy
equilibria and the mixed strategy equilibrium are all made up of ESS's. (The
dominated equilibrium strategy is nonetheless an ESS, because given that the
other players are using it, no player could do as well by deviating.)
As an example of (b), consider the Utopian Exchange Economy game in Table 4,
adapted from problem 7.5 of Gintis (2000). In Utopia, each citizen can
produce either one or two units of individualized output. He will then go
into the marketplace and meet another citizen. If either of them produced only
one unit, trade cannot increase their payoffs. If both of them produced two,
however, they can trade one unit for one unit, and both end up happier with
their increased variety of consumption.
\begin{center} {\bf Table 4 The Utopian Exchange Economy Game }
\begin{tabular}{lllccc} & & &\multicolumn{3}{c}{\bf Jones}\\
& & & {\it Low Output} & & $ High Output$ \\ & &
$Low Output$ & {\bf 1, 1} & $\leftrightarrow$ & { 1, 1}\\ & {\bf
Smith:} &&$\updownarrow$& & $\downarrow$ \\ & & {\it High Output } &
{ 1,1} & $\rightarrow$ & {\bf 2,2} \\
\end{tabular}
\end{center}
\vspace{-24pt}
{\it Payoffs to: (Smith, Jones). Arrows show how a player can increase his
payoff. }
\bigskip
This game has three Nash equilibria, one of which is in mixed strategies.
Since all strategies but {\it High Output} are weakly dominated, that alone is
an ESS. {\it Low Output} fails to meet condition (\ref{e4.6}b), because it is
not the strictly best response to itself. If the economy began with all
citizens choosing {\it Low Output}, then if Smith deviated to {\it High
Output} he would not do any better, but if {\it two} people deviated to {\it
High Output}, they would do better in expectation because they might meet each
other and receive the payoff of (2,2).
\noindent {\bf An Example of ESS: { Hawk-Dove} }
\noindent The best-known illustration of the ESS is the game of { Hawk-Dove}
. Imagine that we have a population of birds, each of whom can behave as an
aggressive Hawk or a pacific Dove. We will focus on two randomly chosen birds,
Bird One and Bird Two. Each bird has a choice of what behavior to choose on
meeting another bird. A resource worth $V=2$ ``fitness units'' is at stake when
the two birds meet. If they both fight, the loser incurs a cost of $C=4$, which
means that the expected payoff when two Hawks meet is $-1$ ($=0.5[2] + 0.5[-4])$
for each of them. When two Doves meet, they split the resource, for a payoff of
1 apiece. When a Hawk meets a Dove, the Dove flees for a payoff of 0, leaving
the Hawk with a payoff of 2. Table 5 summarizes this.
\begin{center} {\bf Table 5 { Hawk-Dove}: Economics Notation }
\begin{tabular}{lllccc} & & &\multicolumn{3}{c}{\bf Bird Two}
\\ & & & {\it Hawk} & & $ Dove$ \\ & & $Hawk$
& -1,-1 & $\rightarrow$ & {\bf 2,0}\\ & {\bf Bird One:} &&$\downarrow$& &
$\uparrow$ \\ & & {\it Dove } & {\bf 0, 2} & $\leftarrow$ & 1,1 \\
\end{tabular}
\end{center}
\vspace{-24pt}
{\it Payoffs to: (Bird One, Bird Two). Arrows show how a player can increase
his payoff. }
\bigskip
These payoffs are often depicted differently in biology games. Since the
two players are identical, one can depict the payoffs by using a table
showing the payoffs only of the row player. Applying this to { Hawk-Dove}
generates Table 6.
\begin{center} {\bf Table 6 { Hawk-Dove}: Biology Notation}
\begin{tabular}{lllccc}
& & &\multicolumn{3}{c}{\bf Bird Two}\\ & &
& {\it Hawk} & & $ Dove$ \\ & & $Hawk$ & -1 & & 2
\\ & {\bf Bird One:} && & & \\ & & {\it Dove } & 0 & & 1 \\
\multicolumn{6}{l}{\it Payoffs to: (Bird One) } \end{tabular} \end{center}
{ Hawk-Dove} is Chicken with new feathers. The two games have the same
ordinal ranking of payoffs, as can be seen by comparing Table 5 with Chapter
3's Table 2, and their equilibria are the same except for the mixing parameters.
{ Hawk-Dove} has no symmetric pure-strategy Nash equilibrium, and hence no
pure-strategy ESS, since in the two asymmetric Nash equilibria, $Hawk$ gives a
bigger payoff than $Dove$, and the doves would disappear from the population.
In the ESS for this game, neither hawks nor doves completely take over the
environment. If the population consisted entirely of hawks, a dove could invade
and obtain a one-round payoff of 0 against a hawk, compared to the $-1$ that a
hawk obtains against itself. If the population consisted entirely of doves, a
hawk could invade and obtain a one-round payoff of 2 against a dove, compared to
the 1 that a dove obtains against a dove.
In the mixed-strategy ESS, the equilibrium strategy is to be a hawk with
probability 0.5 and a dove with probability 0.5, which can be interpreted as a
population 50 percent hawks and 50 percent doves. As in the mixed-strategy
equilibria in chapter 3, the players are indifferent as to their strategies.
The expected payoff from being a hawk is the 0.5(2) from meeting a dove plus the
$0.5(-1)$ from meeting another hawk, a sum of 0.5. The expected payoff from
being a dove is the 0.5(1) from meeting another dove plus the 0.5(0) from
meeting a hawk, also a sum of 0.5. Moreover, the equilibrium is stable in a
sense similar to the Cournot equilibrium. If 60 percent of the population were
hawks, a bird would have a higher fitness level as a dove. If ``higher
fitness'' means being able to reproduce faster, the number of doves increases
and the proportion returns to 50 percent over time.
The ESS depends on the strategy sets allowed the players. If two birds can base
their behavior on commonly observed random events such as which bird arrives at
the resource first, and $V \sum_{t=1}^T \delta^t \tilde{\pi_t}. $$
\noindent 2 Specify that the discount rate is strictly positive, and use the
present value. Since payments in distant periods count for less, the discounted
value is finite unless the payments are growing faster than the discount rate.
\noindent 3 Use the average payment per period, a tricky method since some sort
of limit needs to be taken as the number of periods averaged goes to infinity.
\\
Whatever the approach, game theorists assume that the payoff function is {\bf
additively separable} over time, which means that the total payoff is based on
the sum or average, possibly discounted, of the one-shot payoffs.
Macroeconomists worry about this assumption, which rules out, for example, a
player whose payoff is very low if any of his one-shot payoffs dips below a
certain subsistence level. The issue of separability will arise again in section
13.5 when we discuss durable monopoly.
\item Ending in finite time with probability one means that the limit of the
probability the game has ended by date $t$ approaches one as $t$ tends to
infinity; the probability that the game lasts till infinity is zero.
Equivalently, the expectation of the end date is finite, which it could not be
were there a positive probability of an infinite length.
\end{itemize}
\noindent {\bf N5.2} {\bf Infinitely Repeated Games, Minimax Punishments, and
the Folk Theorem}
\begin{itemize}
\item Aumann (1981), Fudenberg \& Maskin (1986), Fudenberg \& Tirole (1991a,
pp. 152-62), and Rasmusen (1992a) tell more about the Folk Theorem. The most
commonly cited version of the Folk Theorem says that if conditions 1 to 3 are
satisfied, then:
\noindent {\it Any payoff profile that strictly pareto-dominates the minimax
payoff profiles in the mixed extension of an n-person one-shot game with finite
action sets is the average payoff in some perfect equilibrium of the infinitely
repeated game.}
\item The evolutionary approach can also be applied to the repeated { prisoner's
dilemma}. Boyd \& Lorberbaum (1987) show that no pure strategy, including Tit-
for-Tat, is evolutionarily stable in a population-interaction version of the
{ Prisoner's Dilemma}. Hirshleifer \& Martinez-Coll (1988) have found that
Tit-for-Tat is no longer part of an ESS in an evolutionary { prisoner's
dilemma} if (1) more complicated strategies have higher computation costs; or
(2) sometimes a {\it Silence} is observed to be a {\it Blame} by the other
player. Yet biologists have found animals playing tit-for-tat--- notably the
sticklebacks in Milinski (1987) who can choose whether to shirk or not in
investigating predator fish.
\item {\bf Trigger strategies} of {\it trigger-price strategies} are an
important kind of strategies for repeated games. Consider the oligopolist facing
uncertain demand (as in Stigler [1964]). He cannot tell whether the low demand
he observes facing him is due to Nature or to price cutting by his fellow
oligopolists. Two things that could trigger him to cut his own price in
retaliation are a series of periods with low demand or one period of especially
low demand. Finding an optimal trigger strategy is a difficult problem (see
Porter [1983a]). Trigger strategies are usually not subgame perfect unless the
game is infinitely repeated, in which case they are a subset of the equilibrium
strategies. Recent work has looked carefully at what trigger strategies are
possible and optimal for players in infinitely repeated games; see Abreu, Pearce
\& Staccheti (1990). Many theorists have studied what happens when players can
imperfectly observe each others' actions. For a survey, see Kandori (2002).
$\;\;\;$ Empirical work on trigger strategies includes Porter (1983b), who
examines price wars between railroads in the 19th century, and Slade (1987), who
concluded that price wars among gas stations in Vancouver used small punishments
for small deviations rather than big punishments for big deviations.
\item A macroeconomist's technical note related to the similarity of infinite
games and games with a constant probability of ending is Blanchard (1979),
which discusses speculative bubbles.
\item In the repeated { Prisoner's Dilemma}, if the end date is infinite with
positive probability and only one player knows it, cooperation is possible by
reasoning similar to that of the Gang of Four theorem in Section 6.4.
\item Any Nash equilibrium of the one-shot game is also a perfect equilibrium of
the finitely or infinitely repeated game.
\end{itemize}
\noindent
{\bf N5.3} {\bf Reputation: The One-Sided { Prisoner's Dilemma} }
\begin{itemize} \item {\it A game that is repeated an infinite number of times
without discounting is called a {\bf supergame}}.
There is no connection between the terms ``supergame'' and ``subgame.''
\item The terms, ``one-sided'' and ``two-sided'' { prisoner's dilemma}, are
my inventions. Only the two-sided version is a true prisoner's dilemma according
to the definition of note N1.2.
\item Empirical work on reputation is scarce. One worthwhile effort is Jarrell
\& Peltzman (1985), which finds that product recalls inflict costs greatly in
excess of the measurable direct costs of the operations. The investigations
into actual business practice of Macaulay (1963) is much cited and little
imitated. He notes that reputation seems to be more important than the written
details of business contracts.
\item {\bf Vengeance and Gratitude.} Most models have excluded these feelings
(although see Jack Hirshleifer [1987]), which can be modelled in two ways.
\noindent 1 A player's current utility from $Blame$ or $Silence$ depends on what
the other player has played in the past; or\\ 2 A player's current utility
depends on current actions and the other players' current utility in a way that
changes with past actions of the other player.
The two approaches are subtly different in interpretation. In (1), the joy of
revenge is in the action of blaming. In (2), the joy of revenge is in the
discomfiture of the other player. Especially if the players have different
payoff functions, these two approaches can lead to different results.
\end{itemize}
\noindent
{\bf N5.4} {\bf Product Quality in an Infinitely Repeated Game}
\begin{itemize}
\item The Product Quality Game may also be viewed as a principal- agent model
of moral hazard (see Chapter 7). The seller (an agent), takes the action of
choosing quality that is unobserved by the buyer (the principal), but which
affects the principal's payoff, an interpretation used in much of the Stiglitz
(1987) survey of the links between quality and price.
$\;\;\;$ The intuition behind the Klein \& Leffler model is similar to the
explanation for high wages in the Shapiro \& Stiglitz (1984) model of
involuntary unemployment (section 8.1). Consumers, seeing a low price, realize
that with a price that low the firm cannot resist lowering quality to make
short-term profits. A large margin of profit is needed for the firm to decide
on continuing to produce high quality.
\item A paper related to Klein \& Leffler (1981) is Shapiro (1983), which
reconciles a high price with free entry by requiring that firms price under cost
during the early periods to build up a reputation. If consumers believe, for
example, that any firm charging a high price for any of the first five periods
has produced a low quality product, but any firm charging a high price
thereafter has produced high quality, then firms behave accordingly and the
beliefs are confirmed. That the beliefs are self-confirming does not make them
irrational; it only means that many different beliefs are rational in the many
different equilibria.
\item An equilibrium exists in the Product Quality model only if the entry cost
$F$ is just the right size to make $n$ an integer in equation (\ref{e5.8}). Any
of the usual assumptions to get around the integer problem could be used:
allowing potential sellers to randomize between entering and staying out;
assuming that for historical reasons, $n$ firms have already entered; or
assuming that firms lie on a continuum and the fixed cost is a uniform density
across firms that have entered. \end{itemize}
\bigskip \noindent {\bf N5.5} {\bf Markov equilibria and overlapping
generations in the game of Customer Switching Costs} \begin{itemize} \item We
assumed that the incumbent chooses its price first, but the alternation of
incumbency remains even if we make the opposite assumption. The natural
assumption is that prices are chosen simultaneously, but because of the
discontinuity in the payoff function, that subgame has no equilibrium in pure
strategies. \end{itemize}
\bigskip
\noindent {\bf N5.6} {\bf Evolutionary Equilibrium: The { Hawk-Dove} Game}
\begin{itemize}
\item Dugatkin \& Reeve (1998) is an edited volume of survey articles on
different applications of game theory to biology. Dawkins (1989) is a good
verbal introduction to evolutionary conflict. See also Axelrod \& Hamilton
(1981) for a short article on biological applications of the { Prisoner's
Dilemma}, Hines (1987) for a survey, and Maynard Smith (1982) for a book.
Jack Hirshleifer (1982) compares
the approaches of economists and biologists. Boyd \& Richerson (1985) uses
evolutionary game theory to examine cultural transmission, which has important
differences from purely genetic transmission. \end{itemize}
\newpage
\noindent
{\bf Problems}
\bigskip
\noindent {\bf 5.1. Overlapping Generations (see Samuelson [1958]) } (medium)
\\ There
is a long sequence of players. One player is born in each period $t$, and he
lives for periods $t$ and $t+1$. Thus, two players are alive in any one period,
a youngster and an oldster. Each player is born with one unit of chocolate,
which cannot be stored. Utility is increasing in chocolate consumption, and a
player is very unhappy if he consumes less than 0.3 units of chocolate in a
period: the per-period utility functions are $U(C)=-1$ for $C < 0.3$ and $U(C)=
C$ for $C \geq 0.3$, where $C$ is consumption. Players can give away their
chocolate, but, since chocolate is the only good, they cannot sell it. A
player's action is to consume $X$ units of chocolate as a youngster and give
away $1-X$ to some oldster. Every person's actions in the previous period are
common knowledge, and so can be used to condition strategies upon.
\begin{enumerate}
\item[(a)] If there is finite number of generations, what is the unique Nash
equilibrium?
\item[(b)] If there are an infinite number of generations, what are two Pareto-
ranked perfect equilibria?
\item[(c)] If there is a probability $\theta$ at the end of each period (after
consumption takes place) that barbarians will invade and steal all the chocolate
(leaving the civilized people with payoffs of -1 for any $X$), what is the
highest value of $\theta$ that still allows for an equilibrium with $X=0.5$?
\end{enumerate}
\bigskip
\noindent
{\bf 5.2. Product Quality with Lawsuits } (medium)\\
Modify the Product Quality game of section 5.4 by assuming that if the
seller misrepresents his quality he must, as a result of a class-action suit,
pay damages of $x$ per unit sold, where $x \in (0,c]$ and the seller becomes
liable for $x$ at the time of sale. \begin{enumerate}
\item[(a)] What is $\tilde{p}$ as a function of $x,F,c$, and $r$? Is $\tilde{p}
$ greater than when $x=0$?
\item[(b)]] What is the equilibrium output per firm? Is it greater than when
$x=0$?
\item[(c)] What is the equilibrium number of firms? Show that a rise in $x$ has
an ambiguous effect on the number of firms.
\item[(d)] If, instead of $x$ per unit, the seller pays $X$ to a law firm to
successfully defend him, what is the incentive compatibility constraint?
\end{enumerate}
\bigskip \noindent
{\bf 5.3. Repeated Games (see Benoit \& Krishna [1985]) }
(hard)\\
Players Benoit and Krishna repeat the game in Table 7 three times, with
discounting:
\begin{center}
{\bf Table 7: A Benoit-Krishna Game }
\begin{tabular}{lllccccc}
& & &\multicolumn{5}{c}{\bf Krishna}\\
& & & {\it Silence } & & $Waffle $ & &$Blame$ \\
& && & & & \\
& & $ Silence$ & 10,10 & & $-1,-12$& & $-1, 15$\\
& && & & & \\
& {\bf Benoit:} & {\it Waffle } & $ -12,-1$ & & 8,8 & & $-1,-1$
\\
& && & & & \\
& & {\it Blame } & $ 15,-1$ & & $-1$,$-1$ & & $ 0,0$ \\
& && & & & \\ \multicolumn{8}{l}{\it Payoffs to: (Benoit, Krishna).}
\end{tabular} \end{center}
\begin{enumerate}
\item[(a)] Why is there no equilibrium in which the players play $Silence$ in
all three periods?
\item[(b)] Describe a perfect equilibrium in which both players pick $Silence$
in the first two periods.
\item[(c)] Adapt your equilibrium to the twice-repeated game.
\item[(d)] Adapt your equilibrium to the $T$-repeated game.
\item[(e)] What is the greatest discount rate for which your equilibrium
still works in the three-period game? \end{enumerate}
\bigskip
\noindent
{\bf 5.4. Repeated Entry Deterrence } (medium) \\
Assume that Entry Deterrence I
is repeated an infinite number of times, with a tiny discount rate and with
payoffs received at the start of each period. In each period, the entrant
chooses $Enter$ or $Stay\; out$, even if he entered previously.
\begin{enumerate} \item[(a)] What is a perfect equilibrium in which the entrant
enters each period?
\item[(b)] Why is ({\it Stay out, Fight}) not a perfect equilibrium?
\item[(c)] What is a perfect equilibrium in which the entrant never enters?
\item[(d)] What is the maximum discount rate for which your strategy profile in
part (c) is still an equilibrium? \end{enumerate}
%---------------------------------------------------------------
\bigskip \noindent {\bf 5.5. The Repeated Prisoner's Dilemma } (medium) \\
Set $P=0$ in the general { Prisoner's Dilemma} in Chapter 1's Table 9,
and assume that $2R > S+T$.
\begin{enumerate}
\item[(a)] Show that the Grim Strategy, when played by both players, is a
perfect equilibrium for the infinitely repeated game. What is the maximum
discount rate for which the Grim Strategy remains an equilibrium?
\item[(b)] Show that Tit-for-Tat is not a perfect equilibrium in the
infinitely repeated { Prisoner's Dilemma} with no discounting.
\end{enumerate}
\bigskip
%---------------------------------------------------------------
\noindent
{\bf 5.6. Evolutionarily Stable Strategies } (medium)\\
A population of scholars are playing the following coordination game over their
two possible conversation topics over lunch, football and economics. Let $N_t(F)
$ and $N_t(E)$ be the numbers who talk football and economics in period $t$, and
let $\theta$ be the percentage who talk football, so $\theta =
\frac{N(football)}{N(football)+N(economics)}$. Government regulations requiring
lunchtime attendance and stipulating the topics of conversation have maintained
the values $\theta= 0.5$, $N_t(F)=50,000$ and $N_t(E)=50,000$ up to this year's
deregulatory reform. In the future, some people may decide to go home for lunch
instead, or change their conversation. Table 8 shows the payoffs.
\begin{center}
{\bf Table 8: Evolutionarily Stable Strategies}
\begin{tabular}{lllccc}
& & &\multicolumn{3}{c}{\bf Scholar 2 }\\ & & & {\it
Football } ($\theta$) & & $ Economics$ ($1- \theta$) \\ & &
$Football $ ($\theta$) & 1,1 & & $0,0$ \\ & {\bf Scholar 1} && &
& \\ & & {\it Economics } ($1-\theta$) & $0,0$ & & 5,5\\
\multicolumn{6}{l}{\it Payoffs to: (Scholar 1, Scholar 2) } \end{tabular}
\end{center}
\begin{enumerate}
\item[(a)] There are three Nash equilibria: {\it (Football, Football),
(Economics, Economics)}, and a mixed-strategy equilibrium. What are the
evolutionarily stable strategies?
\item[(b)]
Let $N_t(s)$ be the number of scholars playing a particular strategy in period
$t$ and let $\pi_t(s)$ be the payoff. Devise a Markov difference equation to
express the population dynamics from period to period: $N_{t+1}(s) = f(N_t(s),
\pi_t (s))$. Start the system with a population of 100,000, half the scholars
talking football and half talking economics. Use your dynamics to finish Table
9.
\begin{center}
{\bf Table 9: Conversation Dynamics }
\begin{tabular}{|l|l|l|l|l| l|} \hline $t$ & $N_t(F )$ & $N_t(E)$ & $\theta$&
$\pi_t(F)$ & $\pi_t(E)$\\ \hline -1 &50,000& 50,000& 0.5& 0.5 &2.5 \\ \hline 0 &
& & & & \\ \hline 1 & & & & & \\ \hline 2 & & & &. & \\ \hline
\end{tabular}
\end{center}
\item[(c)]
Repeat part (b), but specifying non-Markov dynamics, in which $N_{t+1}(s) =
f(N_t(s), \pi_t (s), \pi_{t-1} (s))$. \end{enumerate}
%---------------------------------------------------------------
\bigskip
\noindent
{\bf 5.7. Grab the Dollar } (medium)\\
Table 10 shows the payoffs for the
simultaneous-move game of Grab the Dollar. A silver dollar is put on the table
between Smith and Jones. If one grabs it, he keeps the dollar, for a payoff of 4
utils. If both grab, then neither gets the dollar, and both feel bitter. If
neither grabs, each gets to keep something.
\begin{center}
{\bf Table 10: Grab
the Dollar}
\begin{tabular}{lllccc}
& & &\multicolumn{3}{c}{\bf Jones}\\ & & & {\it Grab}
($\theta$) & & {\it Wait} ($1-\theta$) \\ & & $ Grab$ ($\theta$)
& $-1,-1$ & & $4,0$ \\ & {\bf Smith:} && & & \\ & & {\it Wait }
($1-\theta$) & $0,4$ & & 1,1 \\ \multicolumn{6}{l}{\it Payoffs to:
(Smith, Jones) } \end{tabular} \end{center} \begin{enumerate}
\item[(a)]
What are the evolutionarily stable strategies?
\item[(b)]
Suppose each player in the population is a point on a continuum, and that the
initial amount of players is 1, evenly divided between {\it Grab} and {\it
Wait}. Let $N_t(s)$ be the amount of players playing a particular strategy in
period $t$ and let $\pi_t(s)$ be the payoff. Let the population dynamics be
$N_{t+1}(i) = \left(2 N_t(i) \right)\left(\frac{ \pi_t(i)}{\sum_j \pi_t(j) }
\right) $. Find the missing entries in Table 11. \\
\begin{center}
{\bf Table 11: Grab the Dollar Dynamics }
\begin{tabular}{|l|l|l|l|l| l|l|} \hline t & $N_t(G)$ & $N_t(W)$ & $N_t(total)$
& $\theta$& $\pi_t(G)$ & $\pi_t(w)$\\ \hline 0 &0.5 & 0.5& 1&0.5 &1.5 &0.5\\
\hline 1 & & & & & & \\ \hline 2 & & & & & & \\ \hline \end{tabular}
\end{center}
\item[(c)]
Repeat part (b), but with the dynamics $N_{t+t}(s) = [1 + \frac{ \pi_t(s)}
{\sum_j \pi_t(j) }][2N_t(s)]$.
\item[(d)]
Which three games that have appeared so far in the book resemble Grab the
Dollar?
\end{enumerate}
%---------------------------------------------------------------
\newpage
\begin{center}
{\bf The Repeated Prisoner's Dilemma: A Classroom Game for Chapter 5}
\end{center}
Consider the following Prisoner's Dilemma, obtained by adding 8 to each payoff
in Table 2 from Chapter 1:
\begin{center}
{\bf Table 12: The Prisoner's Dilemma}
\begin{tabular}{lllccc}
& & &\multicolumn{3}{c}{\bf Column}\\
& & & {\it Silence} & & {\it Blame} \\
& & {\it Silence} & 7,7 & & -2, 8 \\
& {\bf Row} && & & \\
& & {\it Blame} & 8,-2 & & {\bf 0,0} \\
\multicolumn{6}{l}{\it Payoffs to: (Row,Column) }
\end{tabular}
\end{center}
Students will pair up to repeat this game 10 times in the same pair. The
objective is to get as high a summed, undiscounted, payoff as possible ({\it
not } just to get a higher summed payoff than any other person in the class).
\end{small}
\end{document}