Last time I talked about a series of metrics for measuring how cooperative a prisoner's dilemma strategy is based on PageRank-like eigenvector methods. In this post we will attempt to infer the memory depth of various strategies.

Many strategies for the prisoner's dilemma use just the previous round of plays to determine their next move. These strategies are Markovian and are often called memory-one strategies. Let's define the memory depth of a strategy as how many rounds of the history of play the strategy uses to determine the next action. Some simple strategies use no memory at all, such as Cooperator (also known as ALLC), Defector (also known as ALLD), and various $$Random(q)$$ strategies that cooperate with probability $$q$$. These strategies have a memory depth of zero. In fact, Cooperator and Defector are special cases of Random, $$Random(1)$$ and $$Random(0)$$ respectively.

Many well-known strategies are memory-one: Tit-For-Tat (TFT), Win-Stay-Lose-Shift (also know as Pavlov), Generous TFT, Zero-Determinant strategies like ZD-GTFT-2 and ZD-Extort-2. These strategies are all defined by a set of four conditional probabilities, conditioned on two actions from the last round of play: $Pr(C \,| \, CC), Pr(C \,| \, CD), Pr(C\,| \, DC), Pr(C\,| \, DD).$ All the memory-zero strategies can be written as memory-one strategies. For example, $$Random(q)$$ has the four conditional probabilities $$(q, q, q, q)$$. When discussing memory depth we typically mean the least memory depth.

There are some well-known strategies of memory depth two, such as Tit-For-Two-Tats and Two-Tits-For-Tat. So it goes until we get to strategies that have unbounded (or infinite) memory depth, such as Grudger (also known as Grim), which cooperates until the opponent defects, and then defects permanently thereafter. The definitions of all of these strategies are in the documentation for the Axelrod library.

We can visualize the way a given strategy plays easily from the images available in the AxelrodExamples repository. For example, for Grudger we can see that it tends to either cooperate or defect indefinitely after a small number of rounds depending on whether the opponent tries to defect or not: Figure: The heatmap colors indicate the probability of coooperation for each round against each opponent.

Now that we know about memory depth, there are two natural questions:

1. Can we use knowledge of the opponent's memory depth to formulate effective counter-strategies?
2. Can we infer the memory depth of a player from its history of play?
The answer to both is a qualified yes. This post will focus on question #2 and we'll tackle #1 in a future post. (If you can't wait on #2 see this paper for something very similar.)

To infer whether or not a strategy uses the history of play (and how much) is very closely related to testing if a process is Markovian. Essentially we want to see if the strategy's choices of cooperate $$C$$ and defect $$D$$ depend on the history or not. Let's focus on memory-one players for now, since that is where the most common and historically successful strategies reside, and since it captures some of the strategies that use more than one round of history as well.

Fisher's Exact test

For each possible last round of play $$(C, C)$$, $$(C, D)$$, $$(D, C)$$, and $$(D, D)$$, we want to determine if the overall pattern of cooperate and defect differs from the pattern conditioned on the last round. A good statistical tool here is Fisher's exact test, particularly useful since it can tolerate small counts.

Fisher's exact test is one of several tests for contingency tables to tell if the distribution of observations depends on a particular characteristic. It's called an exact test because the p-value can be computed exactly. For the following contingency table:

 Overall Context Cooperate a b Defect c d
For Fisher's exact test we compute the p-value: $p = \frac{\binom{a+b}{a} \binom{c+d}{c}}{\binom{a+b+c+d}{a+c}}$

For the prisoner's dilemma strategies we'll use the Axelrod library and for the statistical calculations SciPy. For a given opponent, we will use a $$Random(0.5)$$ player to probe the opponent's behavior in the four possible contexts, and then look at the output of the four Fisher tests to determine if the strategy uses context or not. For now this will only tell us if the memory depth is zero or one-or-greater, and we can easily adapt the method to test if the memory depth is greater than one or not.

Let's look at the code -- all the code for these calculations is available here in the AxelrodExamples repository. First let's write a generator to probe the opponent and yield a new round of play.

from itertools import islice

import numpy
from scipy import stats

import axelrod

C, D = axelrod.Actions.C, axelrod.Actions.D # The plays

def collect_data(opponent):
"""Generator to collect data from opponent."""
player = axelrod.Random(0.5) # The probe strategy
while True:
player.play(opponent)
yield (player.history[-1], opponent.history[-1])


Since "standard" Axelrod tournaments last 200 rounds, we'll generate that many rounds of plays for our statistical tests. This is an arbitrary choice but good enough for our purposes at the moment.

def infer_depth(opponent, test_rounds=200):
"""Collect data and report statistical tests."""
data_dict = {(C, C): [0, 0],
(C, D): [0, 0],
(D, C): [0, 0],
(D, D): [0, 0]}
# Save the history locally as we go
history = []
for turn in islice(collect_data(opponent), test_rounds + 1):
history.append(turn)
if len(history) < 2:
continue
# Record opponents play and context
context = history[-2]
# Context is reversed for opponent
context = (context, context)
# Count the opponent's Cs and Ds
if turn == C:
data_dict[context] += 1
else:
data_dict[context] += 1
# Perform the Fisher tests
test_results = fet_tests(data_dict)
# Estimate the four conditional probabilities
estimate = memory_one_estimate(data_dict)

return data_dict, test_results, estimate


Here we use an auxillary function to perform the Fisher tests:
def fet_tests(data_dict):
"""Performs four Fisher exact tests on each possible context."""
# What has the opponent been doing?
C_count = sum(v for (k, v) in data_dict.items())
D_count = sum(v for (k, v) in data_dict.items())

test_results = dict()
# Pull out the C and D counts for each context
for context in data_dict.keys():
C_count_context = data_dict[context]
D_count_context = data_dict[context]

# Build the conditional table
table = numpy.array([[C_count, D_count], [C_count_context, D_count_context]])

# Run the Fisher test
test_stat, pvalue = stats.fisher_exact(table)
test_results[context] = (test_stat, pvalue)

return test_results


Finally, we estimate what the memory-one probabilities would be:
def memory_one_estimate(data_dict):
"""Estimates the memory-one strategy probabilities from the observed
data."""
estimates = dict()
for context in data_dict.keys():
C_count = data_dict[context]
D_count = data_dict[context]
try:
estimates[context] = float(C_count) / (C_count + D_count)
except ZeroDivisionError:
estimates[context] = None
return estimates


Results

First let's see how the test performs on the memory-zero strategies. In this case we are looking for large p-values to indicate that the context does not matter to the strategy.

Memory-Zero strategies

For Cooperator, since there are no defect $$D$$ plays, the p-value returned is equal to one. This tells us that the test does not detect any dependence on the context. The test statistic (the odds ratio) is reported as "nan" (not a number), since there are zeros in the numerator and denominator.

For each context the counts of cooperation and defection are reported [C_count, D_count], followed by the Fisher tests [odds_ratio, p-value]. In this case the test statistic is not particularly useful. Finally an estimate of the memory-one probabilities are given for each context.

Cooperator
----------
Collected Data
('C', 'C') [98, 0]
('C', 'D') [102, 0]
('D', 'C') [0, 0]
('D', 'D') [0, 0]
C count, D count: 200, 0

Fisher Exact Tests
('C', 'C') (nan, 1.0)
('C', 'D') (nan, 1.0)
('D', 'C') (nan, 1.0)
('D', 'D') (nan, 1.0)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 1.0
('D', 'C') None
('D', 'D') None

Similarly for Defector:
Defector
--------
Collected Data
('C', 'C') [0, 0]
('C', 'D') [0, 0]
('D', 'C') [0, 110]
('D', 'D') [0, 90]
C count, D count: 0, 200

Fisher Exact Tests
('C', 'C') (nan, 1.0)
('C', 'D') (nan, 1.0)
('D', 'C') (nan, 1.0)
('D', 'D') (nan, 1.0)

Estimated Memory One Probabilities
('C', 'C') None
('C', 'D') None
('D', 'C') 0.0
('D', 'D') 0.0

Let's look at a few Random strategies, with $$q=0.4$$, $$q=0.5$$, $$q=0.9$$. These all have large p-values, indicating that no context dependence is found. The estimation of the value of $$q$$ is not great (with only 50 observations on average for each), but combined the estimates are good.
Random: 0.4
-----------
Collected Data
('C', 'C') [17, 21]
('C', 'D') [13, 26]
('D', 'C') [25, 46]
('D', 'D') [22, 30]
C count, D count: 77, 123

Fisher Exact Tests
('C', 'C') (0.7733142037302726, 0.47511966409073536)
('C', 'D') (1.2520325203252032, 0.59186065362403784)
('D', 'C') (1.151869918699187, 0.67041186227270722)
('D', 'D') (0.85365853658536583, 0.63518025860574567)

Estimated Memory One Probabilities
('C', 'C') 0.447368421053
('C', 'D') 0.333333333333
('D', 'C') 0.352112676056
('D', 'D') 0.423076923077

Random: 0.5
-----------
Collected Data
('C', 'C') [31, 22]
('C', 'D') [24, 20]
('D', 'C') [18, 27]
('D', 'D') [23, 35]
C count, D count: 96, 104

Fisher Exact Tests
('C', 'C') (0.6550868486352357, 0.21649276063705136)
('C', 'D') (0.76923076923076927, 0.5060693638965994)
('D', 'C') (1.3846153846153846, 0.40862375394941558)
('D', 'D') (1.4046822742474916, 0.29649708949521131)

Estimated Memory One Probabilities
('C', 'C') 0.584905660377
('C', 'D') 0.545454545455
('D', 'C') 0.4
('D', 'D') 0.396551724138

Random: 0.9
-----------
Collected Data
('C', 'C') [75, 7]
('C', 'D') [85, 12]
('D', 'C') [8, 0]
('D', 'D') [11, 2]
C count, D count: 179, 21

Fisher Exact Tests
('C', 'C') (0.79555555555555557, 0.82663698528113017)
('C', 'D') (1.2033613445378151, 0.69456859061690301)
('D', 'C') (0.0, 1.0)
('D', 'D') (1.5497835497835497, 0.63690251649498575)

Estimated Memory One Probabilities
('C', 'C') 0.914634146341
('C', 'D') 0.876288659794
('D', 'C') 1.0
('D', 'D') 0.846153846154


Memory-One Strategies

Now let's look at some memory-one strategies. In this case, we want one or more of the p-values to be very small, indicating dependence on context. First up is Alternator, a strategy that simply alternates between $$C$$ and $$D$$. In this case the conditional distributions (marginals) are very different from the overall distribution of $$C$$ and $$D$$, and the p-values reflect that. This means our test is sufficiently sensitive to context to distinguish Alternator from Random(0.5).
Alternator
----------
Collected Data
('C', 'C') [0, 57]
('C', 'D') [0, 43]
('D', 'C') [47, 0]
('D', 'D') [53, 0]
C count, D count: 100, 100

Fisher Exact Tests
('C', 'C') (inf, 6.3205476665659948e-15)
('C', 'D') (inf, 7.9555951932726203e-12)
('D', 'C') (0.0, 9.6006561703569167e-13)
('D', 'D') (0.0, 5.9247070841622583e-14)

Estimated Memory One Probabilities
('C', 'C') 0.0
('C', 'D') 0.0
('D', 'C') 1.0
('D', 'D') 1.0

Fisher's test also identifies context dependence for Tit-For-Tat, Generous-TFT, WSLS, ZD-GTFT-2, and ZD-Extort-2. In each case at least one of the p-values is small. Note that in some cases the tests do not detect strong context dependence since the marginal distributions happen to match closely with the global distributions -- for example, the $$(D, C)$$ context of ZD-Extort-2.
Tit For Tat
-----------
Collected Data
('C', 'C') [40, 0]
('C', 'D') [0, 54]
('D', 'C') [54, 0]
('D', 'D') [0, 52]
C count, D count: 94, 106

Fisher Exact Tests
('C', 'C') (0.0, 6.6997320433673142e-12)
('C', 'D') (inf, 2.9255237915116562e-13)
('D', 'C') (0.0, 2.1649314730202577e-15)
('D', 'D') (inf, 1.2718588152635841e-12)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 1.0
('D', 'D') 0.0

GTFT: 0.33
----------
Collected Data
('C', 'C') [74, 0]
('C', 'D') [23, 40]
('D', 'C') [34, 0]
('D', 'D') [6, 23]
C count, D count: 137, 63

Fisher Exact Tests
('C', 'C') (0.0, 2.0203977666423354e-10)
('C', 'D') (3.7819185645272602, 1.2780853216031501e-05)
('D', 'C') (0.0, 1.2210728979401064e-05)
('D', 'D') (8.3359788359788354, 1.5281995925971629e-06)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.365079365079
('D', 'C') 1.0
('D', 'D') 0.206896551724

Win-Stay Lose-Shift
-------------------
Collected Data
('C', 'C') [56, 0]
('C', 'D') [0, 47]
('D', 'C') [0, 50]
('D', 'D') [47, 0]
C count, D count: 103, 97

Fisher Exact Tests
('C', 'C') (0.0, 4.4273359133597802e-14)
('C', 'D') (inf, 2.8578761339220433e-13)
('D', 'C') (inf, 7.045200839141348e-14)
('D', 'D') (0.0, 3.0657322928216384e-12)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 0.0
('D', 'D') 1.0

ZD-GTFT-2
---------
Collected Data
('C', 'C') [70, 0]
('C', 'D') [8, 51]
('D', 'C') [37, 0]
('D', 'D') [14, 20]
C count, D count: 129, 71

Fisher Exact Tests
('C', 'C') (0.0, 1.8870930103954297e-11)
('C', 'D') (11.58274647887324, 2.418451195241036e-12)
('D', 'C') (0.0, 7.3406047957813036e-07)
('D', 'D') (2.5955734406438631, 0.013052957986131809)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.135593220339
('D', 'C') 1.0
('D', 'D') 0.411764705882

ZD-Extort-2
-----------
Collected Data
('C', 'C') [39, 2]
('C', 'D') [32, 21]
('D', 'C') [22, 33]
('D', 'D') [0, 51]
C count, D count: 93, 107

Fisher Exact Tests
('C', 'C') (0.044572250179726818, 1.5655757864449233e-09)
('C', 'D') (0.57038551401869164, 0.089145542918579304)
**('D', 'C') (1.3037383177570094, 0.44549467346489691)**
('D', 'D') (inf, 2.8752088848893428e-12)

Estimated Memory One Probabilities
('C', 'C') 0.951219512195
('C', 'D') 0.603773584906
('D', 'C') 0.4
('D', 'D') 0.0


Memory-Two Strategies

Next let's examine some strategies that are known to use the last two rounds of play. Our tests will only tell us if there is some context dependence, not how deep the context goes, though we could go further and test the sixteen two-round conditionals. Again in each case there is at least one small p-value.

Tit For 2 Tats
--------------
Collected Data
('C', 'C') [74, 0]
('C', 'D') [51, 26]
('D', 'C') [26, 0]
('D', 'D') [0, 23]
C count, D count: 151, 49

Fisher Exact Tests
('C', 'C') (0.0, 5.1754688548530664e-08)
('C', 'D') (1.5710284113645459, 0.13249047346488516)
('D', 'C') (0.0, 0.0017488691940591009)
('D', 'D') (inf, 3.1773129980859225e-13)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.662337662338
('D', 'C') 1.0
('D', 'D') 0.0

Two Tits For Tat
----------------
Collected Data
('C', 'C') [21, 0]
('C', 'D') [0, 26]
('D', 'C') [25, 55]
('D', 'D') [0, 73]
C count, D count: 46, 154

Fisher Exact Tests
('C', 'C') (0.0, 1.0356335796058765e-12)
('C', 'D') (inf, 0.0032107778150677457)
('D', 'C') (0.65714285714285714, 0.17186276199192116)
('D', 'D') (inf, 1.941575560727638e-07)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 0.3125
('D', 'D') 0.0

Cycler CCD # Repeats C C D over and over
----------
Collected Data
('C', 'C') [28, 36]
('C', 'D') [39, 31]
('D', 'C') [35, 0]
('D', 'D') [31, 0]
C count, D count: 133, 67

Fisher Exact Tests
('C', 'C') (2.5522388059701493, 0.0018458827759712023)
('C', 'D') (1.5778798316111748, 0.11406264461353598)
('D', 'C') (0.0, 3.1783477142132759e-06)
('D', 'D') (0.0, 1.9919662014206444e-05)

Estimated Memory One Probabilities
('C', 'C') 0.4375
('C', 'D') 0.557142857143
('D', 'C') 1.0
('D', 'D') 1.0


Memory-N Strategies

Finally let's look at a few strategies with memory depth greater than 2. The cycler strategies just repeat the given cycle and have memory depth equal to the cycle length minus one. Our tests have trouble definitely detecting context dependence for these two strategies, though there is certainly some indication that it is present for CyclerCCCD. For CyclerCCCCCD, the dependence is simply too deep for a one-level depth test to capture in 200 rounds of play. However it may be possible to combine the four tests into a single p-value that boosts the strength.

Cycler CCCD
-----------
Collected Data
('C', 'C') [53, 28]
('C', 'D') [47, 22]
('D', 'C') [26, 0]
('D', 'D') [24, 0]
C count, D count: 150, 50

Fisher Exact Tests
('C', 'C') (1.5849056603773586, 0.10850431867457283)
('C', 'D') (1.4042553191489362, 0.27318173231339726)
('D', 'C') (0.0, 0.0017470835251844653)
('D', 'D') (0.0, 0.003042948000774)

Estimated Memory One Probabilities
('C', 'C') 0.654320987654
('C', 'D') 0.68115942029
('D', 'C') 1.0
('D', 'D') 1.0

Cycler CCCCCD
-------------
Collected Data
('C', 'C') [73, 16]
('C', 'D') [61, 17]
('D', 'C') [18, 0]
('D', 'D') [15, 0]
C count, D count: 167, 33

Fisher Exact Tests
('C', 'C') (1.1091739310917392, 0.73751701170009998)
('C', 'D') (1.4103328365623446, 0.30214645168183302)
('D', 'C') (0.0, 0.082187256489540589)
('D', 'D') (0.0, 0.13489215698596341)

Estimated Memory One Probabilities
('C', 'C') 0.820224719101
('C', 'D') 0.782051282051
('D', 'C') 1.0
('D', 'D') 1.0

Hard-Tit-For-Tat uses the last three rounds of the history of play, and this is detected in the $$(C, C)$$ context.
Hard Tit For Tat
----------------
Collected Data
('C', 'C') [8, 0]
('C', 'D') [0, 12]
('D', 'C') [11, 79]
('D', 'D') [0, 90]
C count, D count: 19, 181

Fisher Exact Tests
('C', 'C') (0.0, 2.9279238390821181e-08)
('C', 'D') (inf, 0.60673716369645825)
('D', 'C') (0.75389251632345555, 0.53305225846415238)
('D', 'D') (inf, 0.0012376693753429098)

Estimated Memory One Probabilities
('C', 'C') 1.0
('C', 'D') 0.0
('D', 'C') 0.122222222222
('D', 'D') 0.0

Anticycler uses the entire history of play, attempting to never have a cycle of any length. Explicitly, the output is $C D C C D C C C D C C C C D \ldots$ and so on, increasing the number of coooperations between defections by one after each defection. Here the context-dependence goes undetected.
AntiCycler
----------
Collected Data
('C', 'C') [91, 8]
('C', 'D') [73, 10]
('D', 'C') [9, 0]
('D', 'D') [9, 0]
C count, D count: 182, 18

Fisher Exact Tests
('C', 'C') (0.88888888888888884, 1.0)
('C', 'D') (1.3850837138508372, 0.51196828171543551)
('D', 'C') (0.0, 1.0)
('D', 'D') (0.0, 1.0)

Estimated Memory One Probabilities
('C', 'C') 0.919191919192
('C', 'D') 0.879518072289
('D', 'C') 1.0
('D', 'D') 1.0

Finally, we consider Grudger, which is also of infinite memory depth, and goes undetected. In this case, Grudger becomes effectively memory-zero once the first defection is played by the opponent (which happens quickly for our probing player). So the output still tells us something -- the strategy has eventual memory depth zero. This is also essentially the case for AntiCycler -- after enough rounds the strategy becomes effectively equal to Cooperator. Had we used fewer rounds, Grudger's behavior would not be washed out, and the context dependence would be detected.
Grudger
-------
Collected Data
('C', 'C') [0, 0]
('C', 'D') [0, 1]
('D', 'C') [0, 117]
('D', 'D') [0, 82]
C count, D count: 0, 200

Fisher Exact Tests
('C', 'C') (nan, 1.0)
('C', 'D') (nan, 1.0)
('D', 'C') (nan, 1.0)
('D', 'D') (nan, 1.0)

Estimated Memory One Probabilities
('C', 'C') None
('C', 'D') 0.0
('D', 'C') 0.0
('D', 'D') 0.0


Going Further

If you've made it this far, you've probably come up with a few ideas of your own. We could continue the analysis in at least the following ways:
• We could extend our tests to detect Memory-two strategies by looking at more conditional probabilities, and so on for Memory-N strategies
• It may be possible to find a better probing strategy than $$Random(0.5)$$
• We could consider the effect of noise on the ability to detect memory depth
• For practical usage, it would be nice to know the average minimum number of rounds it takes to determine if a strategy is memory-one or not
• Are there ways for a strategy to prevent having its memory depth inferred?
• Can we reliably calculate the eventual memory depth, say by using a sliding window of history rather than the entire history? Is the eventual memory depth useful for designing counter strategies?