The prisoner's dilemma is a common model in game theory for the interaction of individuals and has been used extensively to study the evolution of cooperation. There are hundreds of strategies for the prisoner's dilemma, many dating back to Axelrod's original tournaments and new strategies are still being discovered.

There are several software packages that replicate some aspects of iterated prisoner's dilemma tournaments, and recently a new library has been created to reproduce the results of earlier researchers, and to enable anyone to easily create and contribute new strategies.

In 2014, Tyler Singer-Clark defined several morality metrics that evaluate strategies for iterated prisoner's dilemma tournaments. There is a good summary on Scott Aaronson's blog and Singer-Clark's manuscript [pdf] is quite readable. The Axelrod library is now capable of reproducing Singer-Clark's results as well as extending them to the study of all the strategies in the library (currently about 80 ordinary strategies have been implemented).

The basic idea behind these morality metrics is to determine not just how often players cooperate, but also to determine how cooperative strategies are with other cooperative strategies. Some strategies, such as Tit-For-Tat, are known for enforcing cooperation and punishing defection. As Aaronson says: "A moral person is someone who cooperates with other moral people, and who refuses to cooperate with immoral people." Other strategies try to maximize their personal payoffs, and are described as exploitative, while others are forgiving of defection or generous.

In an attempt to capture these behaviors, Singer-Clark computes principal eigenvectors of the cooperation matrix and variants. The idea is analogous to the PageRank algorithm, in which the morality of a strategy is dependent on how it behaves against other strategies and how those strategies behave with others (and so on). Let's discuss a few of these metrics:

• The Good-Partner Rating: how often a strategy cooperates at least as much as its opponents
• EigenJesus: The principal eigenvector of the cooperation matrix
• EigenMoses: The principal eigenvector of the cooperation matrix, modified to $D = 2 C - 1$ where $$C$$ is the cooperation matrix and $$1$$ is the matrix of all ones.
The EigenJesus rating is higher for cooperative strategies but does not penalize defection against uncooperative strategies as much as say the raw cooperation rates. The EigenMoses rating goes a bit further, and actually punishes strategies that cooperative with uncooperative opponents.

Typically the largest and smallest EigenJesus scores are for the strategies ALLC and ALLD (also called Cooperator and Defector). Variants of Tit-For-Tat typically score highly on the EigenJesus measure, and even higher for EigenMoses, since these strategies both cooperate with cooperative strategies and defect against uncooperative strategies.

## Metrics for Various Tournaments

First we show that the results of the Axelrod library match those from Singer-Clark's manuscript (see Table 1 on page 7). Note that the order is essentially the same even though the values differ (eigenvectors are only unique up to normalization). Note that PAVLOV is the same as Win-Stay-Lose-Shift, FRIEDMAN is the same as Grudger, and there is a $$p \leftrightarrow 1-p$$ symmetry in the definitions of Joss between Singer-Clark's implementation and Axelrod's. While the order of strategies for the EigenRanks are not identical, they are very close, and some of the strategies are stochastic, so this is not unexpected. For the definitions of any of these strategies, see the documentation for the Axelrod library.

Morality Metrics for Tyler Singer-Clark's tournament. Note that eigenvectors are not unique and that the order is the same as the tables in Singer-Clark's manuscript.
The table is sortable -- simply click on the headers.

Player NameMean ScoreCooperation RateGood Partner RatingEigenJesusEigenMoses
Cooperator2.4631.01.00.2880.234
Defector1.7410.00.0530.0-0.234
Eatherley2.650.8911.00.280.282
Champion2.6360.8931.00.280.282
GTFT: 0.12.6190.8251.00.2680.28
GTFT: 0.32.6280.8811.00.2750.27
Soft Go By Majority2.660.840.9210.2710.283
Hard Go By Majority2.5390.7460.7070.2560.286
Tit For 2 Tats2.5320.8641.00.2770.289
Random: 0.82.4070.8010.5670.230.14
Random: 0.52.2410.50.5070.144-0.0
Random: 0.21.9260.2010.4390.058-0.14
Win-Stay Lose-Shift2.5210.7710.9260.2460.232
Tit For Tat2.5440.7571.00.2560.281
Two Tits For Tat2.4010.6520.7890.2310.253
Grudger2.4810.6070.7890.2190.241
Tester2.3890.5720.7930.1940.146
Suspicious Tit For Tat2.20.490.5540.1720.102
Joss: 0.92.060.390.30.1370.029
Joss: 0.71.9560.2340.2530.081-0.086

For another perspective on these metrics, consider the following results from a tournament run with various random strategies that cooperate with a given probability, else they defect. In this case each strategy acts without regard to how opponents behave in previous rounds. For all the metrics in the table, the values are monotonic in the cooperation probabilities, however note carefully that the EigenJesus and EigenMoses values are inverse (in rank), with the more cooperative strategies having higher EigenJesus rating and lower EigenMoses rating. In this case the cooperation rate, the good partner rating, and the EigenJesus rating all agree in the ranking of strategies.

Morality Metrics for 20 Random strategies from the Axelrod Library.
The table is sortable -- simply click on the headers.

Player NameMean ScoreCooperation RateGood Partner RatingEigenJesusEigenMoses
Random: 0.03.1010.00.00.00.355
Random: 0.053.0120.050.0530.0190.328
Random: 0.12.9280.0990.1020.0370.273
Random: 0.152.8420.1490.1520.0560.242
Random: 0.22.7560.20.2030.0750.24
Random: 0.252.6730.250.2540.0930.172
Random: 0.32.5830.30.3010.1120.165
Random: 0.352.5020.350.3550.1310.167
Random: 0.42.4180.3990.4010.1490.082
Random: 0.452.3310.450.4480.1680.037
Random: 0.52.2520.50.5040.187-0.001
Random: 0.552.1660.5490.5510.205-0.017
Random: 0.62.0870.60.60.224-0.079
Random: 0.651.9980.650.650.242-0.145
Random: 0.71.9160.7010.6960.262-0.145
Random: 0.751.8320.750.7560.28-0.154
Random: 0.81.7540.7990.7980.298-0.243
Random: 0.851.670.850.8490.317-0.241
Random: 0.91.5860.90.8990.336-0.278
Random: 0.951.5080.950.9490.355-0.299
Random: 1.01.4281.01.00.373-0.355

We can also compute the morality metrics for all the strategies in the Axelrod library. This produces several insights:

• The strategies that produce the highest mean scores in the tournament -- DoubleCrosser, BackStabber, and Fool Me Once -- do not have particularly extreme EigenJesus or EigenMoses rankings. Nevertheless these strategies take a strong "moral" stance, defecting forever if opponents defect too many times.
• The two Zero Determinant strategies behave as expected: the generous strategy ZD-GTFT-2 scores well on both Eigen rankings, whereas the extortionate ZD-Extort-2 has a negative EigenMoses value and a low EigenJesus value.
• Two strategies Alternator (which plays C then D in an endless cycle) and Random 0.5 which plays C or D at random with equal probability) have the same EigenJesus and EigenMoses values and cooperation rates. This makes sense because the cooperation matrix cannot discriminate between the two strategies.
• The strategies with negative EigenMoses ratings tend to be those that probe naively for weaknesses and have low mean cooperation rates.
• The strategies with the largest EigenMoses rating tend to be Tit-For-Tat variants or strategies that otherwise follow their opponents (e.g. the various Soft-Majority strategies).
• There does not appear to be a particularly good correlation between the mean score and either Eigen ranking.

Morality Metrics for all ordinary strategies in the Axelrod Library.
The table is sortable -- simply click on the headers.

Player NameMean ScoreCooperation RateGood Partner RatingEigenJesusEigenMoses
$$\phi$$2.1090.6610.6240.090.026
$$\pi$$2.1730.7690.6250.1060.05
$$e$$2.1770.7480.60.1030.048
Aggravater2.040.1190.0560.022-0.089
Alternator2.2620.50.5180.0730.0
Alternator Hunter2.2950.9880.9880.1450.139
Anti Tit For Tat2.1470.5470.5670.073-0.003
AntiCycler2.0490.910.6660.1330.114
Appeaser2.8260.770.8430.1190.118
Arrogant QLearner2.1850.8420.7150.1290.118
Average Copier2.5390.3830.4560.0660.024
BackStabber2.9930.6420.1750.1050.111
Bully2.30.5140.4740.069-0.008
Calculator2.110.2820.3430.045-0.054
Cautious QLearner2.1910.8420.710.1290.117
Champion2.6510.9091.00.140.151
Cooperator2.2791.01.00.1460.139
Cooperator Hunter2.1240.9490.8120.1370.116
Cycler CCCCCD1.9560.8350.6560.1220.093
Cycler CCCD2.0270.750.6050.110.069
Cycler CCD2.1090.670.6190.0980.047
Davis2.8850.5780.6980.0970.101
Defector2.0320.00.0560.0-0.139
Defector Hunter2.3280.9871.00.1460.144
DoubleCrosser3.0320.6790.190.110.118
Eatherley2.7030.9181.00.1410.149
Feld2.2810.3960.2450.065-0.005
Fool Me Forever2.5240.860.7540.1240.101
Fool Me Once2.9530.6180.7110.1030.112
Forgetful Fool Me Once2.8970.7760.7380.1240.139
Forgetful Grudger2.8030.6610.6480.1110.129
Forgiver2.820.6770.6950.1140.133
Forgiving Tit For Tat2.7370.8461.00.1330.148
GTFT: 0.332.7070.9081.00.1390.144
Grofman2.1860.2890.50.042-0.058
Grudger2.9060.5620.6470.0950.1
Grumpy2.6650.8820.9280.1370.149
Hard Prober2.5120.3470.5340.054-0.029
Hard Tit For 2 Tats2.6370.8691.00.1380.157
Hard Tit For Tat2.7030.6970.6470.1170.138
Hesitant QLearner2.1890.8420.7120.1290.117
Inverse2.80.7620.7540.1220.131
Inverse Punisher2.8160.6670.6730.1110.124
Joss: 0.92.1420.440.2370.0740.013
Limited Retaliate (0.05/20)2.860.660.6610.110.125
Limited Retaliate (0.08/15)2.8380.670.6610.1120.128
Limited Retaliate (0.1/20)2.820.6730.6610.1130.13
Math Constant Hunter2.8070.7240.7590.1110.103
Nice Average Copier2.910.5640.6480.0950.101
Once Bitten2.7180.820.8360.1320.155
Opposite Grudger2.2680.9270.7820.140.14
Prober2.1610.2620.610.043-0.051
Prober 22.2350.6390.7340.1020.062
Prober 32.2630.1830.4360.03-0.076
Punisher2.7250.6970.6540.1170.138
Random Hunter2.3990.9530.9530.1410.136
Random: 0.52.1440.50.5380.0730.0
Retaliate (0.05)2.8090.6340.6490.1070.121
Retaliate (0.08)2.8180.6670.6490.1120.13
Retaliate (0.1)2.8060.6660.6480.1120.13
Risky QLearner2.1880.8420.7190.1290.118
Shubik2.8670.7450.7530.1210.137
Sneaky Tit For Tat2.3350.7120.7210.1060.066
Soft Go By Majority2.6930.8860.9350.1380.153
Soft Go By Majority:102.70.8950.9630.1390.151
Soft Go By Majority:202.7090.8950.9480.1390.151
Soft Go By Majority:402.7140.90.9480.1390.151
Soft Go By Majority:52.7160.8590.9460.1360.152
Soft Joss: 0.92.7430.8721.00.1350.145
Stochastic WSLS2.2320.550.5510.0820.022
Suspicious Tit For Tat2.1890.5330.5480.0860.034
Tester2.2680.4890.6960.080.024
Tit For 2 Tats2.6080.8871.00.1390.157
Tit For Tat2.7310.8361.00.1320.146
Tricky Cooperator2.0040.8030.6620.1110.057
Tricky Defector2.3060.3410.4850.05-0.026
Tullock2.1990.480.3460.080.028
Two Tits For Tat2.6620.7230.6490.120.143
Win-Stay Lose-Shift2.830.770.8430.1190.118
ZD-Extort-22.0730.3070.1630.051-0.038
ZD-GTFT-22.7440.891.00.1370.144

## Metrics for Noisy Tournaments

Adding background noise is a common variant for IPD tournaments. When a player submits its next move there is a chance that it is flipped C to D or D to C. This is perhaps a more realistic model, and it causes some strategies such as Tit-For-Tat to fall into undesirable cycles of cooperation and defection.

Recomputing the data for a tournament with 5% noise reveals some interesting behaviors.

• Although DoubleCrosser and BackStabber still have the highest mean scores, they both fall from 3 points per round to 2.5. They also cooperate far less often, from 60-70% down to 35% and 20% respectively, because random defections by opponents drive these strategies to permanently retaliate. The EigenRanks also drop substantially, with BackStabber having negative EigenMoses ranking, as well as the 2nd through fifth top mean scorers.
• Many more strategies have a negative EigenMoses value now, including Fool Me Once (forgives one and only one defection), which drops in EigenMoses rank dramatically. Fool Me Once only cooperates 13% of the time but nets the fourth highest mean score. Most of the lowest scorers have positive EigenMoses (which suggests that they are exploited frequently).
• The Tit-For-Tat variants drop out of the top 10 EigenJesus rankings which are now dominated by the serial cooperators, and interestingly, the Qlearner strategies.
• The Go By Majority strategies top the EigenMoses rankings while the TFT variants slide a bit.
• ZD-Extort-2 actually has a positive EigenMoses ranking and cooperates 36% of the time, versus 30% in the noise-free tournament.

Morality Metrics for all ordinary strategies in the Axelrod Library, with 5% noise.
The table is sortable -- simply click on the headers.

Player NameMean ScoreCooperation RateGood Partner RatingEigenJesusEigenMoses
$$\phi$$2.1530.6640.5920.102-0.043
$$\pi$$2.1120.8030.6710.1340.023
$$e$$2.1520.7740.650.1270.009
Aggravater2.3050.0620.1210.012-0.126
Alternator2.2580.5010.5020.090.0
Alternator Hunter2.050.9490.9060.170.134
Anti Tit For Tat2.2010.5280.5190.075-0.083
AntiCycler2.0240.8690.7150.1560.11
Appeaser2.2070.5420.5640.0980.024
Arrogant QLearner2.0460.8750.7690.160.132
Average Copier2.3580.0850.1280.016-0.117
BackStabber2.5110.2090.2050.042-0.043
Bully2.180.5290.5260.075-0.082
Calculator2.2120.4490.4430.0980.079
Cautious QLearner2.0420.8760.7750.160.132
Champion2.1660.7480.8960.1520.191
Cooperator2.0460.950.9180.170.134
Cooperator Hunter2.0080.9380.8720.1680.127
Cycler CCCCCD1.990.8020.6820.1440.089
Cycler CCCD2.0430.7250.6320.130.067
Cycler CCD2.120.6530.6040.1170.045
Davis2.3870.1150.1760.021-0.109
Defector2.2930.050.1180.009-0.134
Defector Hunter2.0550.9490.9030.170.135
DoubleCrosser2.5130.3490.3450.0730.044
Eatherley2.250.7650.8830.1530.189
Feld2.2760.3870.2950.0850.045
Fool Me Forever1.9990.9280.8590.1650.125
Fool Me Once2.4460.1330.1540.026-0.091
Forgetful Fool Me Once2.3610.420.4230.0890.065
Forgetful Grudger2.260.2180.1390.049-0.044
Forgiver2.3460.30.230.0690.036
Forgiving Tit For Tat2.2440.5780.6420.1260.161
GTFT: 0.332.1970.7160.7280.1430.158
Grofman2.2040.310.4540.056-0.056
Grudger2.4060.0950.1130.018-0.113
Grumpy2.3010.7650.8330.1530.19
Hard Prober2.3940.2970.4310.0610.004
Hard Tit For 2 Tats2.2190.6470.8710.1390.187
Hard Tit For Tat2.1790.2950.1230.0690.01
Hesitant QLearner2.0490.8750.7760.160.132
Inverse2.2930.6320.5050.1170.077
Inverse Punisher2.2930.2030.1970.044-0.06
Joss: 0.92.2110.4550.3440.1010.091
Limited Retaliate (0.05/20)2.3080.1930.2850.04-0.069
Limited Retaliate (0.08/15)2.2740.2320.3120.05-0.046
Limited Retaliate (0.1/20)2.2570.2370.2820.052-0.039
Math Constant Hunter2.470.4090.4470.069-0.022
Nice Average Copier2.4050.0950.1210.018-0.113
Once Bitten2.2860.5280.4670.1180.151
Opposite Grudger2.0730.940.8850.1690.133
Prober2.2520.2830.4190.059-0.013
Prober 22.1510.5890.6260.1280.163
Prober 32.3230.2130.3710.043-0.044
Punisher2.2080.3130.1350.0730.027
Random Hunter2.1690.860.8020.1550.122
Random: 0.52.1390.50.530.09-0.0
Retaliate (0.05)2.2980.1640.1240.036-0.076
Retaliate (0.08)2.270.1950.1240.044-0.056
Retaliate (0.1)2.2540.2140.1220.049-0.043
Risky QLearner2.0470.8760.7730.160.132
Shubik2.3070.3280.2520.0740.032
Sneaky Tit For Tat2.0950.7190.6860.1390.14
Soft Go By Majority2.3020.7210.7610.1480.193
Soft Go By Majority:102.2810.6680.7490.1430.199
Soft Go By Majority:202.290.6760.7640.1440.2
Soft Go By Majority:402.3080.6960.7670.1460.2
Soft Go By Majority:52.2560.6060.6570.1330.183
Soft Joss: 0.92.240.6080.6210.1290.153
Stochastic WSLS2.160.5240.5450.0940.012
Suspicious Tit For Tat2.1350.4950.5020.110.113
Tester2.1470.4960.5140.1120.13
Tit For 2 Tats2.1960.6960.8930.1470.196
Tit For Tat2.2370.5450.4830.120.142
Tricky Cooperator2.0670.8160.7130.1360.038
Tricky Defector2.2990.3590.4540.048-0.11
Tullock2.190.4430.3160.0990.085
Two Tits For Tat2.170.3580.1330.0840.05
Win-Stay Lose-Shift2.1660.5320.5590.0960.015
ZD-Extort-22.1740.3640.3480.0780.016
ZD-GTFT-22.220.6510.6540.1340.152

## Final Remarks

Of the two EigenRanks, the EigenMoses value is (in my opinion) the more interesting, simply because it is easy to see how to maximize or minimize EigenJesus (always cooperate or always defect). EigenMoses is more discriminating, and picks up on TFT-like behavior well, but always cooperating still ranks fairly high. Of course it is important to keep in mind that the rankings depend on the mix of strategies in the tournament. In the following tournament I have selected mainly players that use either particular cycles of C and D or avoid cycles completely. The strategy with the highest EigenJesus and EigenMoses rankings (and the lowest mean score!) is not TitForTat or any particularly cooperative strategy, rather the strategy AntiCycler that plays the sequence: $C \, CD \, CCD \, CCCD \, CCCCD ...$ Rather than act cooperatively, AntiCycler is merely trying not to be systematically exploited by playing an easily detectable sequence. Nevertheless, TFT fares well in EigenRank but loses out to Win-Stay-Lose-Shift and FoolMeOnce for the top mean score (with the lowest EigenRanks).

Player NameMean ScoreCooperation RateGood Partner RatingEigenJesusEigenMoses
Tit For Tat2.8060.8341.00.4170.478
Alternator2.6010.50.2860.2480.0
Cycler CCD2.3280.670.5710.3320.224
Cycler CCCD2.1910.750.5710.3710.33
Cycler CCCCCD2.0510.8350.8570.4130.442
AntiCycler1.9450.911.00.450.541
Win-Stay Lose-Shift3.0410.690.4290.3330.159
Fool Me Once3.6230.3970.2860.18-0.314