\documentclass[12pt,epsf]{article}
\parskip 10pt
\begin{document}
\baselineskip 16pt
%\begin{LARGE}
\parindent 24pt
\parskip 10pt
\titlepage
\vspace*{12pt}
\begin{center}
{\bf Gang of Four Proof}
\end{center}
Suppose
the Prisoner's Dilemma in Table 1 is repeated $T$ times with no
discounting. Player Row is definitely rational. With probability $1-
\delta$, Column is also rational, but with probability $\delta$
he simply plays the Grim Strategy: $Deny$ unless some player has ever
chosen Confess, in which case choose $Confess$. Row cannot observe
whether Column is rational.
\begin{center}
{\bf Table 1: The Prisoner's Dilemma}
\begin{tabular}{lllccc}
& & &\multicolumn{3}{c}{\bf Column}\\
& & & {\it Deny} & & {\it Confess} \\
& & {\it Deny} & -1,-1 & $\rightarrow$ & -10, 0 \\
& {\bf Row} &&$\downarrow$& & $\downarrow$ \\
& & {\it Confess} & 0,-10 & $\rightarrow$ & {\bf -
8,-8} \\
\multicolumn{6}{l}{\it Payoffs to: (Row,Column) }
\end{tabular}
\end{center}
\noindent
Proposition: In any
perfect bayesian equilibrium, the number of stages in which either
player chooses {\rm Confess} is less than some number $M$ that depends
on
$\delta$ but not on $T$.
Lemma 1. If Row knows Column is irrational, there is
some $T$ great enough that Row will start by playing $Deny$ and will
only play $Confess$ in the last $N$ periods. Find the value of $N$.
\noindent
{\it Proof.}
Consider $Row$'s decision. Suppose nobody has yet chosen $Confess$.
He will definitely choose $Confess$ in period $T$ to get 0 instead of -8
or -8 instead of -10.
In period $T-1$, his continuation payoff if he chooses $Confess$ is
\begin{equation} \label{e1}
\pi (Confess) = 0 -8 = -8
\end{equation}
whereas if he chooses Deny it is
\begin{equation} \label{e2}
\pi (Deny) = -1 -0 = -1
\end{equation}
Thus, it is better to choose $Deny$ in period T-1. But this reasoning
just gets stronger as we do to earlier periods, because the payoff from
playing Deny is $-1 + -1 +... + -1 +0$ versus the payoff from Confess of
$0 + -8 + -8 + ... + -8 + -8$. Thus, $N=1$.
If you tried to answer this question by constructing Row's payoff viewed from the first period, instead of starting from the end and working backwards, you would have a much more difficult time finding the answer, and nobody did that successfully. Working back from the end, on the other hand, the problem is easy.
\bigskip
Lemma 2.
Suppose Row believes that Column is irrational with
probability $\delta$ and that if he is rational, he will start
choosing $Confess$ (and with probability one) in period $T-K$ but not
before. Row will choose $Deny$ in periods 0 through $T-K$ if
and only if $\delta$ equals some value $ \delta_0(K)$ or greater, and
that $\delta_0$ varies with $K$ but not with $T$.
\noindent
{\it Proof.} Suppose Row is deciding on his move for period $T-K$ and nobody has yet played $Confess$.
If Row picks $Deny$, then if Column is an irrational, Grim, player Column will choose $Deny$ for all of these last $K$ periods, and Row can look
forward to $K-1$ periods of payoffs of -1 and one period (at the end) of a payoff of
0; but if Column is rational, Column will choose $Confess$ in period $T-K$, for a payoff of -10 to Row in that period, and after that both will play $Confess$, for $K-1$ periods of payoffs of
-8.
Row's continuation payoff from $Deny$ is therefore
\begin{equation} \label{e3}
\pi (Deny) = \delta [ (K-1) (-1) + 0] + (1-\delta)[ -10+ (K-1) (-8)
]
\end{equation}
If Row picks $Confess$, then if Column is the irrational, Grim, player Column will pick $Deny$ in period $T-K$, for a payoff of 0 to Row, and $Confess$ thereafter, for $ K-1$ periods of payoffs
of -8; but if Column is rational, Column will also pick $Confess$ in period $T-K $ and thereafter, for a total of $K$ periods of -8 payoffs. Row's continuation payoff from $Confess$ is therefore
\begin{equation} \label{e4}
\pi (Confess) = \delta [ 0 + (K-1) (-8) ] + (1-\delta)[ (K ) (-8)
]
\end{equation}
The difference $[\pi (Deny) - \pi (Confess)]=0$ if
\begin{equation} \label{e5}
\begin{array}{l}
\delta [ (K-1) (-1) + 0] + (1-\delta)[ -10+ (K-1) (-8) ] =
\delta [ 0 + (K-1) (-8) ] + (1-\delta)[ (K ) (-8) ] \\
\\
- \delta (K-1) - (1-\delta) (10) - (K-1) (1-\delta) (8) = -8\delta (K-1) - 8 K (1-\delta) \\
\\
7\delta (K-1) - (1-\delta) (10) - K (1-\delta)(8) + (1)(1-\delta)(8) = - K (1-\delta)(8) \\
\\
7\delta (K-1) - (1-\delta) (10) + (1)(1-\delta)(8) = 0 \\
\\
\delta (7)(K-1) - 10+ \delta (10) +8 -\delta (8) = 0 \\
\\
\delta [(7)(K-1)+2] =2 \\
\\
\\
\delta = \frac{ 2}{ (7)(K-1)+2 }.
\end{array}
\end{equation}
Clearly this $\delta$ depends on $K$ but not on $T$.
If $\delta$ is big enough that $Row$ would not want to $Confess$ in period $T-K$, then he will not want to $Confess$ in period $T-K-1$ either, a fortiori, because that would merely extend the period of $-8$ payoffs replacing -1 payoff by 1 period. And a fortiori he would not choose $Confess$ before $T-K-1$.
\bigskip
Lemma 3. If Row believes
Column is irrational with probability $\delta$ and that if he is
rational, he will start choosing $Confess$ as part of a mixed
strategy with positive probability some time after $T-S$ but not
before. Row will choose $Deny$ in periods
0 through $T-S$ if and only if $\delta$ equals some value $
\delta_1(S)$ or greater, and that $\delta_1$ varies with $S$ but not
with $T$.
\bigskip
\noindent
Proposition: In any
perfect Bayesian equilibrium, the number of stages in which either
player chooses {\rm Confess} is less than some number $M$ that depends
on
$\delta$ but not on $T$.
\noindent
{\it Proof.} Lemma 3 says that Row will not start choosing
$Confess$ at least until the last $S$ periods, where $ S $
depends on $\delta$ but not on $T$, even if he thinks that a rational
Column has some probability of choosing $Confess$ starting in period
$T-S$.
It's actually easier, though to go back to Lemma 2 and use it instead, finessing the issue of mixed strategies. Suppose $\delta$ is big enough that that Row will never choose $Confess$ until at least period $T-K+1$ (he might then, if he thought a rational Column would start choosing $Confess$ in period $T-K+1$). In that case Column will not choose $Confess$ until period $T-K$ at the earliest, one period before Row might choose $Confess$. This is true for the reasons we saw in part (a): a player will not choose $Confess$ until one period before he thinks the other player will choose $Confess$.
Hence, for every $K$, there is a $\delta$ such that neither player will choose $Confess$ before $T-K$. Such a value of $\delta $ was found in part (b), and it depends on $K$ but not on $T$. Set $M=K$ and the proof is done.
Note, though, that this proof does not establish that some player will choose $Confess$ exactly at period $T-K$--- just some time {\it at or after} $T-K$. It is a weak bound.
MORE NOTES:
1. The paper says that it does not consider pareto-dominated sequential equilibria. There aren't any, though, as the paper also says, in the very last paragraph.
2. There is no scope for out-of-equilibrium beliefs in this model. The only types are RATIONAL and TIT-FOR-TAT. The TFT player has no possibility of choosing FINK first, so whenever Row picks FINK first, Column learns Row's type with certainty.
For most of the game, only COOPERATE is observed in equilibrium. If Column FINKS, that does not change beliefs, since Column's type is known anyway. If Row FINKS, that reveals his type perfectly, so there is no need to impose an out-of-equilibrium belief--bayes's rule does it.
3. Out of equilibrium beliefs become relevant if we instead modelled the game in the conventional way, in which Row might have standard payoffs or might have payoff that induce it to behave like a tit-for-tat player might-- $a<1$. In that case, if Row chooses FINK in move 1, contrary to the equilibrium, Column must form an out-of-equilibrium belief, because either the standard Row or the TFT row might have made that mistake.
Possibly there now exists an equilibrium without cooperation. But probably not.
\end{document}