javascript libraries: mathjax, prism.js

Last time I talked about a series of metrics for measuring how cooperative a prisoner's dilemma strategy is based on PageRank-like eigenvector methods. In this post we will attempt to infer the memory depth of various strategies.

Many strategies for the prisoner's dilemma use just the previous round of plays
to determine their next move. These strategies are Markovian and are often called
**memory-one** strategies. Let's define the memory depth of a strategy as how
many rounds of the history of play the strategy uses to determine the next action. Some simple
strategies use no memory at all, such as *Cooperator* (also known as ALLC),
Defector (also known as ALLD), and various *\(Random(q)\)* strategies that cooperate
with probability \(q\). These strategies have a memory depth of zero. In fact, Cooperator and
Defector are special cases of Random, \(Random(1)\) and \(Random(0)\) respectively.

Many well-known strategies are memory-one: *Tit-For-Tat* (TFT), *Win-Stay-Lose-Shift*
(also know as Pavlov), Generous TFT, Zero-Determinant strategies like *ZD-GTFT-2* and
*ZD-Extort-2*. These strategies are all defined by a set of four conditional
probabilities, conditioned on two actions from the last round of play:
\[Pr(C \,| \, CC), Pr(C \,| \, CD), Pr(C\,| \, DC), Pr(C\,| \, DD). \]
All the memory-zero strategies can be written as memory-one strategies. For example,
\(Random(q)\) has the four conditional probabilities \((q, q, q, q)\). When discussing
memory depth we typically mean the *least* memory depth.

There are some well-known strategies of memory depth two, such as
*Tit-For-Two-Tats* and *Two-Tits-For-Tat*. So it goes until we get to strategies
that have unbounded (or infinite) memory depth, such as *Grudger* (also known as Grim),
which cooperates until the opponent defects, and then defects permanently thereafter.
The definitions of all of these strategies are in the documentation for the Axelrod library.

We can visualize the way a given strategy plays easily from the images available in the AxelrodExamples repository. For example, for Grudger we can see that it tends to either cooperate or defect indefinitely after a small number of rounds depending on whether the opponent tries to defect or not:

Now that we know about memory depth, there are two natural questions:

- Can we use knowledge of the opponent's memory depth to formulate effective counter-strategies?
- Can we infer the memory depth of a player from its history of play?

To infer whether or not a strategy uses the history of play (and how much) is very closely related to testing if a process is Markovian. Essentially we want to see if the strategy's choices of cooperate \(C\) and defect \(D\) depend on the history or not. Let's focus on memory-one players for now, since that is where the most common and historically successful strategies reside, and since it captures some of the strategies that use more than one round of history as well.

For each possible last round of play \((C, C)\), \((C, D)\), \((D, C)\), and \((D, D)\), we want to determine if the overall pattern of cooperate and defect differs from the pattern conditioned on the last round. A good statistical tool here is Fisher's exact test, particularly useful since it can tolerate small counts.

Fisher's exact test is one of several tests for contingency tables to tell if the
distribution of observations depends on a particular characteristic. It's called
an *exact* test because the p-value can be computed exactly. For the following
contingency table:

Overall | Context | |

Cooperate | a | b |

Defect | c | d |

For the prisoner's dilemma strategies we'll use the Axelrod library and for the statistical calculations SciPy. For a given opponent, we will use a \(Random(0.5)\) player to probe the opponent's behavior in the four possible contexts, and then look at the output of the four Fisher tests to determine if the strategy uses context or not. For now this will only tell us if the memory depth is zero or one-or-greater, and we can easily adapt the method to test if the memory depth is greater than one or not.

Let's look at the code -- all the code for these calculations is available here in the AxelrodExamples repository. First let's write a generator to probe the opponent and yield a new round of play.

```
from itertools import islice
import numpy
from scipy import stats
import axelrod
C, D = axelrod.Actions.C, axelrod.Actions.D # The plays
def collect_data(opponent):
"""Generator to collect data from opponent."""
player = axelrod.Random(0.5) # The probe strategy
while True:
player.play(opponent)
yield (player.history[-1], opponent.history[-1])
```

Since "standard" Axelrod tournaments last 200 rounds, we'll generate that many rounds of plays for our statistical tests. This is an arbitrary choice but good enough for our purposes at the moment.

```
def infer_depth(opponent, test_rounds=200):
"""Collect data and report statistical tests."""
data_dict = {(C, C): [0, 0],
(C, D): [0, 0],
(D, C): [0, 0],
(D, D): [0, 0]}
# Save the history locally as we go
history = []
for turn in islice(collect_data(opponent), test_rounds + 1):
history.append(turn)
if len(history) < 2:
continue
# Record opponents play and context
context = history[-2]
# Context is reversed for opponent
context = (context[1], context[0])
# Count the opponent's Cs and Ds
if turn[1] == C:
data_dict[context][0] += 1
else:
data_dict[context][1] += 1
# Perform the Fisher tests
test_results = fet_tests(data_dict)
# Estimate the four conditional probabilities
estimate = memory_one_estimate(data_dict)
return data_dict, test_results, estimate
```

Here we use an auxillary function to perform the Fisher tests:

```
def fet_tests(data_dict):
"""Performs four Fisher exact tests on each possible context."""
# What has the opponent been doing?
C_count = sum(v[0] for (k, v) in data_dict.items())
D_count = sum(v[1] for (k, v) in data_dict.items())
test_results = dict()
# Pull out the C and D counts for each context
for context in data_dict.keys():
C_count_context = data_dict[context][0]
D_count_context = data_dict[context][1]
# Build the conditional table
table = numpy.array([[C_count, D_count], [C_count_context, D_count_context]])
# Run the Fisher test
test_stat, pvalue = stats.fisher_exact(table)
test_results[context] = (test_stat, pvalue)
return test_results
```

Finally, we estimate what the memory-one probabilities would be:

```
def memory_one_estimate(data_dict):
"""Estimates the memory-one strategy probabilities from the observed
data."""
estimates = dict()
for context in data_dict.keys():
C_count = data_dict[context][0]
D_count = data_dict[context][1]
try:
estimates[context] = float(C_count) / (C_count + D_count)
except ZeroDivisionError:
estimates[context] = None
return estimates
```

For each context the counts of cooperation and defection are
reported `[C_count, D_count]`

, followed by the
Fisher tests `[odds_ratio, p-value]`

.
In this case the test statistic is not particularly useful.
Finally an estimate of the memory-one probabilities are given for each context.

```
Cooperator
----------
Collected Data
('C', 'C') [98, 0]
('C', 'D') [102, 0]
('D', 'C') [0, 0]
('D', 'D') [0, 0]
C count, D count: 200, 0
Fisher Exact Tests
('C', 'C') (nan, 1.0)
('C', 'D') (nan, 1.0)
('D', 'C') (nan, 1.0)
('D', 'D') (nan, 1.0)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 1.0
('D', 'C') None
('D', 'D') None
```

Similarly for Defector:
```
Defector
--------
Collected Data
('C', 'C') [0, 0]
('C', 'D') [0, 0]
('D', 'C') [0, 110]
('D', 'D') [0, 90]
C count, D count: 0, 200
Fisher Exact Tests
('C', 'C') (nan, 1.0)
('C', 'D') (nan, 1.0)
('D', 'C') (nan, 1.0)
('D', 'D') (nan, 1.0)
Estimated Memory One Probabilities
('C', 'C') None
('C', 'D') None
('D', 'C') 0.0
('D', 'D') 0.0
```

Let's look at a few Random strategies, with \(q=0.4\), \(q=0.5\), \(q=0.9\).
These all have large p-values, indicating that no context dependence is found.
The estimation of the value of \(q\) is not great (with only 50 observations on
average for each), but combined the estimates are good.
```
Random: 0.4
-----------
Collected Data
('C', 'C') [17, 21]
('C', 'D') [13, 26]
('D', 'C') [25, 46]
('D', 'D') [22, 30]
C count, D count: 77, 123
Fisher Exact Tests
('C', 'C') (0.7733142037302726, 0.47511966409073536)
('C', 'D') (1.2520325203252032, 0.59186065362403784)
('D', 'C') (1.151869918699187, 0.67041186227270722)
('D', 'D') (0.85365853658536583, 0.63518025860574567)
Estimated Memory One Probabilities
('C', 'C') 0.447368421053
('C', 'D') 0.333333333333
('D', 'C') 0.352112676056
('D', 'D') 0.423076923077
Random: 0.5
-----------
Collected Data
('C', 'C') [31, 22]
('C', 'D') [24, 20]
('D', 'C') [18, 27]
('D', 'D') [23, 35]
C count, D count: 96, 104
Fisher Exact Tests
('C', 'C') (0.6550868486352357, 0.21649276063705136)
('C', 'D') (0.76923076923076927, 0.5060693638965994)
('D', 'C') (1.3846153846153846, 0.40862375394941558)
('D', 'D') (1.4046822742474916, 0.29649708949521131)
Estimated Memory One Probabilities
('C', 'C') 0.584905660377
('C', 'D') 0.545454545455
('D', 'C') 0.4
('D', 'D') 0.396551724138
Random: 0.9
-----------
Collected Data
('C', 'C') [75, 7]
('C', 'D') [85, 12]
('D', 'C') [8, 0]
('D', 'D') [11, 2]
C count, D count: 179, 21
Fisher Exact Tests
('C', 'C') (0.79555555555555557, 0.82663698528113017)
('C', 'D') (1.2033613445378151, 0.69456859061690301)
('D', 'C') (0.0, 1.0)
('D', 'D') (1.5497835497835497, 0.63690251649498575)
Estimated Memory One Probabilities
('C', 'C') 0.914634146341
('C', 'D') 0.876288659794
('D', 'C') 1.0
('D', 'D') 0.846153846154
```

```
Alternator
----------
Collected Data
('C', 'C') [0, 57]
('C', 'D') [0, 43]
('D', 'C') [47, 0]
('D', 'D') [53, 0]
C count, D count: 100, 100
Fisher Exact Tests
('C', 'C') (inf, 6.3205476665659948e-15)
('C', 'D') (inf, 7.9555951932726203e-12)
('D', 'C') (0.0, 9.6006561703569167e-13)
('D', 'D') (0.0, 5.9247070841622583e-14)
Estimated Memory One Probabilities
('C', 'C') 0.0
('C', 'D') 0.0
('D', 'C') 1.0
('D', 'D') 1.0
```

Fisher's test also identifies context dependence for Tit-For-Tat, Generous-TFT,
WSLS, ZD-GTFT-2, and ZD-Extort-2. In each case at least one of the p-values
is small. Note that in some cases the tests do not detect strong context dependence
since the marginal distributions happen to match closely with the global
distributions -- for example, the \((D, C)\) context of ZD-Extort-2.
```
Tit For Tat
-----------
Collected Data
('C', 'C') [40, 0]
('C', 'D') [0, 54]
('D', 'C') [54, 0]
('D', 'D') [0, 52]
C count, D count: 94, 106
Fisher Exact Tests
('C', 'C') (0.0, 6.6997320433673142e-12)
('C', 'D') (inf, 2.9255237915116562e-13)
('D', 'C') (0.0, 2.1649314730202577e-15)
('D', 'D') (inf, 1.2718588152635841e-12)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 1.0
('D', 'D') 0.0
```

```
GTFT: 0.33
----------
Collected Data
('C', 'C') [74, 0]
('C', 'D') [23, 40]
('D', 'C') [34, 0]
('D', 'D') [6, 23]
C count, D count: 137, 63
Fisher Exact Tests
('C', 'C') (0.0, 2.0203977666423354e-10)
('C', 'D') (3.7819185645272602, 1.2780853216031501e-05)
('D', 'C') (0.0, 1.2210728979401064e-05)
('D', 'D') (8.3359788359788354, 1.5281995925971629e-06)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.365079365079
('D', 'C') 1.0
('D', 'D') 0.206896551724
```

```
Win-Stay Lose-Shift
-------------------
Collected Data
('C', 'C') [56, 0]
('C', 'D') [0, 47]
('D', 'C') [0, 50]
('D', 'D') [47, 0]
C count, D count: 103, 97
Fisher Exact Tests
('C', 'C') (0.0, 4.4273359133597802e-14)
('C', 'D') (inf, 2.8578761339220433e-13)
('D', 'C') (inf, 7.045200839141348e-14)
('D', 'D') (0.0, 3.0657322928216384e-12)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 0.0
('D', 'D') 1.0
```

```
ZD-GTFT-2
---------
Collected Data
('C', 'C') [70, 0]
('C', 'D') [8, 51]
('D', 'C') [37, 0]
('D', 'D') [14, 20]
C count, D count: 129, 71
Fisher Exact Tests
('C', 'C') (0.0, 1.8870930103954297e-11)
('C', 'D') (11.58274647887324, 2.418451195241036e-12)
('D', 'C') (0.0, 7.3406047957813036e-07)
('D', 'D') (2.5955734406438631, 0.013052957986131809)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.135593220339
('D', 'C') 1.0
('D', 'D') 0.411764705882
```

```
ZD-Extort-2
-----------
Collected Data
('C', 'C') [39, 2]
('C', 'D') [32, 21]
('D', 'C') [22, 33]
('D', 'D') [0, 51]
C count, D count: 93, 107
Fisher Exact Tests
('C', 'C') (0.044572250179726818, 1.5655757864449233e-09)
('C', 'D') (0.57038551401869164, 0.089145542918579304)
**('D', 'C') (1.3037383177570094, 0.44549467346489691)**
('D', 'D') (inf, 2.8752088848893428e-12)
Estimated Memory One Probabilities
('C', 'C') 0.951219512195
('C', 'D') 0.603773584906
('D', 'C') 0.4
('D', 'D') 0.0
```

```
Tit For 2 Tats
--------------
Collected Data
('C', 'C') [74, 0]
('C', 'D') [51, 26]
('D', 'C') [26, 0]
('D', 'D') [0, 23]
C count, D count: 151, 49
Fisher Exact Tests
('C', 'C') (0.0, 5.1754688548530664e-08)
('C', 'D') (1.5710284113645459, 0.13249047346488516)
('D', 'C') (0.0, 0.0017488691940591009)
('D', 'D') (inf, 3.1773129980859225e-13)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.662337662338
('D', 'C') 1.0
('D', 'D') 0.0
```

```
Two Tits For Tat
----------------
Collected Data
('C', 'C') [21, 0]
('C', 'D') [0, 26]
('D', 'C') [25, 55]
('D', 'D') [0, 73]
C count, D count: 46, 154
Fisher Exact Tests
('C', 'C') (0.0, 1.0356335796058765e-12)
('C', 'D') (inf, 0.0032107778150677457)
('D', 'C') (0.65714285714285714, 0.17186276199192116)
('D', 'D') (inf, 1.941575560727638e-07)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 0.3125
('D', 'D') 0.0
```

```
Cycler CCD # Repeats C C D over and over
----------
Collected Data
('C', 'C') [28, 36]
('C', 'D') [39, 31]
('D', 'C') [35, 0]
('D', 'D') [31, 0]
C count, D count: 133, 67
Fisher Exact Tests
('C', 'C') (2.5522388059701493, 0.0018458827759712023)
('C', 'D') (1.5778798316111748, 0.11406264461353598)
('D', 'C') (0.0, 3.1783477142132759e-06)
('D', 'D') (0.0, 1.9919662014206444e-05)
Estimated Memory One Probabilities
('C', 'C') 0.4375
('C', 'D') 0.557142857143
('D', 'C') 1.0
('D', 'D') 1.0
```

```
Cycler CCCD
-----------
Collected Data
('C', 'C') [53, 28]
('C', 'D') [47, 22]
('D', 'C') [26, 0]
('D', 'D') [24, 0]
C count, D count: 150, 50
Fisher Exact Tests
('C', 'C') (1.5849056603773586, 0.10850431867457283)
('C', 'D') (1.4042553191489362, 0.27318173231339726)
('D', 'C') (0.0, 0.0017470835251844653)
('D', 'D') (0.0, 0.003042948000774)
Estimated Memory One Probabilities
('C', 'C') 0.654320987654
('C', 'D') 0.68115942029
('D', 'C') 1.0
('D', 'D') 1.0
Cycler CCCCCD
-------------
Collected Data
('C', 'C') [73, 16]
('C', 'D') [61, 17]
('D', 'C') [18, 0]
('D', 'D') [15, 0]
C count, D count: 167, 33
Fisher Exact Tests
('C', 'C') (1.1091739310917392, 0.73751701170009998)
('C', 'D') (1.4103328365623446, 0.30214645168183302)
('D', 'C') (0.0, 0.082187256489540589)
('D', 'D') (0.0, 0.13489215698596341)
Estimated Memory One Probabilities
('C', 'C') 0.820224719101
('C', 'D') 0.782051282051
('D', 'C') 1.0
('D', 'D') 1.0
```

Hard-Tit-For-Tat uses the last three rounds of the history of play, and this is
detected in the \((C, C)\) context.
```
Hard Tit For Tat
----------------
Collected Data
('C', 'C') [8, 0]
('C', 'D') [0, 12]
('D', 'C') [11, 79]
('D', 'D') [0, 90]
C count, D count: 19, 181
Fisher Exact Tests
('C', 'C') (0.0, 2.9279238390821181e-08)
('C', 'D') (inf, 0.60673716369645825)
('D', 'C') (0.75389251632345555, 0.53305225846415238)
('D', 'D') (inf, 0.0012376693753429098)
Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 0.122222222222
('D', 'D') 0.0
```

```
AntiCycler
----------
Collected Data
('C', 'C') [91, 8]
('C', 'D') [73, 10]
('D', 'C') [9, 0]
('D', 'D') [9, 0]
C count, D count: 182, 18
Fisher Exact Tests
('C', 'C') (0.88888888888888884, 1.0)
('C', 'D') (1.3850837138508372, 0.51196828171543551)
('D', 'C') (0.0, 1.0)
('D', 'D') (0.0, 1.0)
Estimated Memory One Probabilities
('C', 'C') 0.919191919192
('C', 'D') 0.879518072289
('D', 'C') 1.0
('D', 'D') 1.0
```

Finally, we consider Grudger, which is also of infinite memory depth, and goes
undetected. In this case, Grudger becomes effectively memory-zero once the first
defection is played by the opponent (which happens quickly for our probing
player). So the output still tells us something -- the strategy has ```
Grudger
-------
Collected Data
('C', 'C') [0, 0]
('C', 'D') [0, 1]
('D', 'C') [0, 117]
('D', 'D') [0, 82]
C count, D count: 0, 200
Fisher Exact Tests
('C', 'C') (nan, 1.0)
('C', 'D') (nan, 1.0)
('D', 'C') (nan, 1.0)
('D', 'D') (nan, 1.0)
Estimated Memory One Probabilities
('C', 'C') None
('C', 'D') 0.0
('D', 'C') 0.0
('D', 'D') 0.0
```

- We could extend our tests to detect Memory-two strategies by looking at more conditional probabilities, and so on for Memory-N strategies
- It may be possible to find a better probing strategy than \(Random(0.5)\)
- We could consider the effect of noise on the ability to detect memory depth
- For practical usage, it would be nice to know the average minimum number of rounds it takes to determine if a strategy is memory-one or not
- Are there ways for a strategy to prevent having its memory depth inferred?
- Can we reliably calculate the eventual memory depth, say by using a sliding window of history rather than the entire history? Is the eventual memory depth useful for designing counter strategies?