Introduction

Following the exploratory analysis in the last article, let's take what we learned and build predictive models for both tournament scores and wins. As it turns out, we can find two very good models using multiple linear regression. Let’s recap some of the insights of the exploratory analysis by looking at various correlations.

Data Analysis

Let’s load the data again.

library(ggplot2)
library(gridExtra)

# Load Data
table1 <- read.csv("table_1.csv")
table2 <- read.csv("table_2.csv")

Correlation Matrices

To analyze the data more closely, we start with the correlations between the variables, confirming the previous observations that:

  • Mean score and mean wins are anticorrelated overall
  • Tournament mean score correlates well with cooperation percentage
  • Mean tournament wins is correlated with initial defection percentage and mean score difference
correlations <- cor(table2[, c(3:63)])
correlations[["tournament_score_mean", "tournament_win_mean"]]
## [1] -0.558697
correlations[["tournament_score_mean", "C_prob"]]
## [1] 0.5005572
correlations[["tournament_win_mean", "C_prob"]]
## [1] -0.828259
correlations[["tournament_win_mean", "mean_score_diff"]]
## [1] 0.4704586

We can see how closely the cooperation rate in the initial rounds of play is related to mean score:

  • Rounds beyond the first two correlate well with mean score
  • Round 7 and mean score correlate well with round 1. This indicates that the initial play is crucial, as is consistency in play
  • The percentage of CC and CC-CC contexts achieved correlates well with tournament score mean
correlations[["round_1", "tournament_score_mean"]]
## [1] 0.4899457
correlations[["round_2", "tournament_score_mean"]]
## [1] 0.3426091
correlations[["round_3", "tournament_score_mean"]]
## [1] 0.4356728
correlations[["round_4", "tournament_score_mean"]]
## [1] 0.4652835
correlations[["round_1", "round_2"]]
## [1] 0.31986
correlations[["round_7", "round_1"]]
## [1] 0.6502112

We can also look at the various context incidences. As we noted last time, long runs of sustained cooperation was positively related with mean tournament score.

correlations[["CC_pct", "tournament_score_mean"]]
## [1] 0.7317138
correlations[["CCCC_pct", "tournament_win_mean"]]
## [1] -0.3768352
correlations[["CCCC_pct", "tournament_score_mean"]]
## [1] 0.5241627

For a few strategies, some of the memory-one conditional probabilities of cooperation in various contexts are non-existent since the context was never reached. Let’s drop those strategies and look at the remaining correlations.

correlations <- cor(na.omit(table2[, c(3:63)]))
correlations[["CC", "tournament_score_mean"]]
## [1] 0.4148202
correlations[["CCCC", "tournament_score_mean"]]
## [1] 0.4712643

Since each tournament was 200 rounds long, the infinite memory depth strategies were effectively depth 200. So let’s replace those values and run the correlations again.

table2$memory_depth[table2$memory_depth == "Inf"] <- 200
correlations <- cor(na.omit(table2[, c(3:63)]))
correlations[["memory_depth", "tournament_score_mean"]]
## [1] 0.4514673
correlations[["memory_depth", "tournament_win_mean"]]
## [1] -0.4862037

This reveals that memory_depth is correlated with mean tournament score and anti-correlated with mean tournament wins. Finally, let’s consider the larger table of all parwise interactions (1000 matches between each pair of players).

table1$memory_depth[table1$memory_depth == "Inf"] <- 200
correlations <- cor(na.omit(table1[, c(4:58)]))
correlations[["mean_score", "mean_score_diff"]]
## [1] 0.8345122

In this case the mean score and the mean score difference are well correlated (0.83). The first ten rounds of play are all correlated (\(\approx 0.6\)) with the first round except for rounds three (0.1) and four (0.37) – it appears that many strategies attempt to vary their play after the first two rounds.

Predictions and Regressions

Following the exploratory analysis above, let’s try some regressions to identify crucial properties for tournament performance. Other than the non-linear relationship between the cooperation probability and score noted earlier, there were a number of potentially exploitable correlations that suggest that linear models should perform well.

Based on the exploratory analysis and my intutions about and experience with the prisoner’s dilemma, it was easy to find several effective models. Going forward I have removed the two outlier strategies that use knowledge of the length of the tournament (BackStabber and DoubleCrosser).

Tournament Wins

For tournament wins, a linear model works quite well with just a few variables: the overall cooperation proportion, the mean round in which the first defection occurs, and the proportions of all rounds by the player that had a DD context.

table2mod <- table2[-c(70, 84),] # toss the two outliers
fit <- lm(tournament_win_mean ~ C_prob + mean_first_defection + DD_pct, data=table2mod)
summary(fit)$r.squared
## [1] 0.8562064

This yields and \(R^2\) fit of 0.86, not bad at all for such a simple model. A model using just the four memory one probabilities also fares well with \(R^2=0.83\), but we lose two strategies (ALLC and ALLD).

fit <- lm(tournament_win_mean ~ CC + CD + DC + DD + C_prob, data=table2mod)
summary(fit)$r.squared
## [1] 0.8306312

Combining the models yields a slight improvement in the fit:

fit <- lm(tournament_win_mean ~ C_prob + mean_first_defection + DD_pct+
    CC + CD + DC + DD, data=table2mod)
summary(fit)$r.squared
## [1] 0.8925036

Adding in some of the two-previous-rounds contexts gets us up to \(R^2 = 0.92\).

fit <- lm(tournament_win_mean ~ C_prob + mean_first_defection +
    CC + CD + DD +
    CCCC_pct + CCCD_pct + CDCD_pct + CDDC_pct + 
    DCDC_pct + DDCC_pct + DDDD_pct, data=table2mod)
summary(fit)
## 
## Call:
## lm(formula = tournament_win_mean ~ C_prob + mean_first_defection + 
##     CC + CD + DD + CCCC_pct + CCCD_pct + CDCD_pct + CDDC_pct + 
##     DCDC_pct + DDCC_pct + DDDD_pct, data = table2mod)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.974  -6.771   0.537   8.093  24.906 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           168.63421    9.92160  16.997  < 2e-16 ***
## C_prob               -215.85492   10.74025 -20.098  < 2e-16 ***
## mean_first_defection   -0.16455    0.02432  -6.765 1.91e-09 ***
## CC                     13.83723    9.95777   1.390 0.168461    
## CD                     19.43727    5.35752   3.628 0.000498 ***
## DD                      8.39618    5.82207   1.442 0.153122    
## CCCC_pct               30.28033    9.42341   3.213 0.001884 ** 
## CCCD_pct              -57.10242   21.01276  -2.718 0.008041 ** 
## CDCD_pct               67.31025   15.04708   4.473 2.49e-05 ***
## CDDC_pct               12.61102    5.48426   2.299 0.024052 *  
## DCDC_pct              -20.59202    6.91479  -2.978 0.003827 ** 
## DDCC_pct               15.02131    7.71741   1.946 0.055072 .  
## DDDD_pct               15.11183    6.86146   2.202 0.030478 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.72 on 81 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.9194, Adjusted R-squared:  0.9074 
## F-statistic: 76.98 on 12 and 81 DF,  p-value: < 2.2e-16

Tournament Score

Let’s try to build a model for the tournament score, which is a more complex beast. The memory-one probabilities yield a model with \(R^2=0.73\).

fit <- lm(tournament_score_mean ~ CC + CD + DC + DD + C_prob, data=table2mod)
summary(fit)
## 
## Call:
## lm(formula = tournament_score_mean ~ CC + CD + DC + DD + C_prob, 
##     data = table2mod)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.45939 -0.08370 -0.00756  0.06907  0.55311 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.22156    0.08506  26.117  < 2e-16 ***
## CC          -0.30759    0.09664  -3.183  0.00202 ** 
## CD          -0.37352    0.04833  -7.729 1.65e-11 ***
## DC          -0.34799    0.04217  -8.252 1.42e-12 ***
## DD          -0.29155    0.05266  -5.536 3.15e-07 ***
## C_prob       1.32964    0.10363  12.831  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1488 on 88 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.7282, Adjusted R-squared:  0.7128 
## F-statistic: 47.16 on 5 and 88 DF,  p-value: < 2.2e-16

Since we know that mean tournament wins and scores are anticorrelated, let’s also try the model that worked well for tournament wins, without the two-round context (a slight improvement):

fit <- lm(tournament_score_mean ~ C_prob + mean_first_defection + DD_pct+
    CC + CD + DC + DD, data=table2mod)
summary(fit)
## 
## Call:
## lm(formula = tournament_score_mean ~ C_prob + mean_first_defection + 
##     DD_pct + CC + CD + DC + DD, data = table2mod)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41825 -0.07444 -0.01031  0.05858  0.57964 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.091e+00  1.563e-01  13.376  < 2e-16 ***
## C_prob                1.405e+00  1.298e-01  10.821  < 2e-16 ***
## mean_first_defection  4.966e-05  2.724e-04   0.182  0.85576    
## DD_pct                1.591e-01  1.615e-01   0.985  0.32735    
## CC                   -3.333e-01  1.006e-01  -3.312  0.00135 ** 
## CD                   -3.460e-01  5.694e-02  -6.077 3.25e-08 ***
## DC                   -3.495e-01  4.376e-02  -7.986 5.64e-12 ***
## DD                   -2.412e-01  7.405e-02  -3.257  0.00161 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1497 on 86 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.7314, Adjusted R-squared:  0.7095 
## F-statistic: 33.45 on 7 and 86 DF,  p-value: < 2.2e-16

We can plot the predicted mean score and mean wins with this model and the analogous model for tournament wins. This produces a reasonable approximation to the original data. Perhaps more rounds of play or more contexts will yield even better models for the score as in the case of tournament wins.

fit <- lm(tournament_score_mean ~ C_prob + mean_first_defection +
    CC + CD + DC + DD + round_1, data=table2mod)
pr1 <- predict(fit, table2)
fit2 <- lm(tournament_win_mean ~ C_prob + mean_first_defection +
    CC + CD + DC + DD + round_1, data=table2mod)
pr2 <- predict(fit2, table2)

df <- data.frame(cbind(pr1, pr2))

plot1 <- ggplot(df, aes(pr1, pr2)) +
    geom_point(size=4, color="Blue") + ggtitle("Predicted Scores and Wins") +
    labs(y="Mean Wins", x="Mean Scores") +
    xlim(1.8, 3.3) + ylim(-5, 170)

plot2 <- ggplot(table2mod, aes(tournament_score_mean, tournament_win_mean)) +
    geom_point(size=4, color="Blue") +
    ggtitle("Actual Scores and Wins") +
    labs(y="Mean Wins", x="Mean Scores") + 
    xlim(1.8, 3.3) + ylim(-5, 170)

grid.arrange(plot1, plot2, ncol=2)

We’re not quite capturing all the features of the actual data, so let’s look for a better model. Here’s one using the cooperation probabilities of the first 10 rounds:

fit <- lm(tournament_score_mean ~ round_1 + round_2 + round_3 + round_4 + round_5 + round_6 +
              round_7 + round_8 + round_9 + round_10, data=table2mod)
summary(fit)
## 
## Call:
## lm(formula = tournament_score_mean ~ round_1 + round_2 + round_3 + 
##     round_4 + round_5 + round_6 + round_7 + round_8 + round_9 + 
##     round_10, data = table2mod)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48127 -0.13627  0.05509  0.14625  0.51329 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.94022    0.08736  22.210  < 2e-16 ***
## round_1      0.25497    0.07746   3.292  0.00145 ** 
## round_2     -0.02090    0.13405  -0.156  0.87646    
## round_3      0.21514    0.20763   1.036  0.30307    
## round_4      0.02644    0.28532   0.093  0.92639    
## round_5      0.53554    0.29948   1.788  0.07731 .  
## round_6      0.44315    0.16480   2.689  0.00863 ** 
## round_7     -0.72110    0.44492  -1.621  0.10877    
## round_8      0.52065    0.35056   1.485  0.14120    
## round_9     -0.05389    0.23305  -0.231  0.81769    
## round_10    -0.50576    0.32966  -1.534  0.12869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2161 on 85 degrees of freedom
## Multiple R-squared:  0.463,  Adjusted R-squared:  0.3998 
## F-statistic: 7.328 on 10 and 85 DF,  p-value: 2.929e-08

Maybe adding round_6 into the previous model will improve the fit. Perhaps by the sixth round many strategies have decided not to put up with the shenanigans of certain opponents.

fit <- lm(tournament_score_mean ~ C_prob + mean_first_defection + CC + CD + DC + DD +
              round_1 + round_6, data=table2mod)
summary(fit)$r.squared
## [1] 0.7724741

\(R^2 = 0.77\), not bad, but not much of an improvement. If we’re willing to include more variables, it is possible to obtain \(R^2 > 0.96\). High scores require sustained cooperation, so the CC context should be important, as is escaping mutual defection. It’s better to use the context count variables rather than the context cooperation percentages (otherwise we lose 30% of the data).

fit <- lm(tournament_score_mean ~ C_prob + memory_depth + 
    CC + CD + DC + DD +
    CC_pct + CD_pct + DC_pct + DD_pct +
    CCCC_pct + CCCD_pct + CCDC_pct + CCDD_pct +
    CDCC_pct + CDCD_pct + CDDC_pct + CDDD_pct +
    DCCC_pct + DCCD_pct + DCDC_pct + DCDD_pct +
    DDCC_pct + DDCD_pct + DDDC_pct + DDDD_pct,
    data=table2mod)
summary(fit)
## 
## Call:
## lm(formula = tournament_score_mean ~ C_prob + memory_depth + 
##     CC + CD + DC + DD + CC_pct + CD_pct + DC_pct + DD_pct + CCCC_pct + 
##     CCCD_pct + CCDC_pct + CCDD_pct + CDCC_pct + CDCD_pct + CDDC_pct + 
##     CDDD_pct + DCCC_pct + DCCD_pct + DCDC_pct + DCDD_pct + DDCC_pct + 
##     DDCD_pct + DDDC_pct + DDDD_pct, data = table2mod)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.155896 -0.031062  0.009124  0.034022  0.112492 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.478e+00  1.462e-01  16.943  < 2e-16 ***
## C_prob       -7.796e-01  3.234e-01  -2.411 0.018679 *  
## memory_depth  3.368e-04  9.296e-05   3.623 0.000562 ***
## CC           -4.294e-01  6.939e-02  -6.189 4.16e-08 ***
## CD           -5.980e-02  3.162e-02  -1.892 0.062871 .  
## DC           -2.126e-01  4.931e-02  -4.312 5.43e-05 ***
## DD           -1.697e-01  6.839e-02  -2.481 0.015619 *  
## CC_pct        1.870e+00  3.196e-01   5.851 1.61e-07 ***
## CD_pct        1.061e-01  3.616e-02   2.934 0.004575 ** 
## DC_pct        1.593e-01  4.775e-02   3.336 0.001390 ** 
## DD_pct       -5.576e-01  1.221e-01  -4.566 2.19e-05 ***
## CCCC_pct      1.890e-02  7.181e-02   0.263 0.793196    
## CCCD_pct      1.715e-01  1.279e-01   1.341 0.184545    
## CCDC_pct      6.542e-02  7.743e-02   0.845 0.401237    
## CCDD_pct     -1.375e-01  7.604e-02  -1.808 0.075122 .  
## CDCC_pct     -8.285e-02  5.193e-02  -1.595 0.115317    
## CDCD_pct     -6.592e-01  1.232e-01  -5.351 1.14e-06 ***
## CDDC_pct     -4.944e-02  6.234e-02  -0.793 0.430540    
## CDDD_pct     -3.404e-01  9.829e-02  -3.463 0.000935 ***
## DCCC_pct      4.349e-01  1.766e-01   2.463 0.016336 *  
## DCCD_pct      2.697e-01  1.610e-01   1.675 0.098558 .  
## DCDC_pct      3.647e-01  7.420e-02   4.916 6.01e-06 ***
## DCDD_pct     -1.583e-03  5.066e-02  -0.031 0.975162    
## DDCC_pct     -6.444e-02  8.940e-02  -0.721 0.473587    
## DDCD_pct      8.134e-01  2.961e-01   2.747 0.007713 ** 
## DDDC_pct     -1.538e-01  7.063e-02  -2.178 0.032963 *  
## DDDD_pct     -3.619e-02  5.503e-02  -0.658 0.513016    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06468 on 67 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.9609, Adjusted R-squared:  0.9458 
## F-statistic: 63.36 on 26 and 67 DF,  p-value: < 2.2e-16

We can see the two most important contexts are CDCD and DCDC. This is probably because we want to reduce this pattern of mutual deadlocking (like Omega TFT). Pairing down the contexts simplifies the model and still yields \(R^2 > 0.93\).

fit <- lm(tournament_score_mean ~ C_prob + memory_depth + 
    CC + CD + DD + CC_pct + CD_pct + DC_pct + DD_pct +
    CDCD_pct + CDDD_pct + DCDC_pct + DDCD_pct + DDDC_pct,
    data=table2mod)
summary(fit)
## 
## Call:
## lm(formula = tournament_score_mean ~ C_prob + memory_depth + 
##     CC + CD + DD + CC_pct + CD_pct + DC_pct + DD_pct + CDCD_pct + 
##     CDDD_pct + DCDC_pct + DDCD_pct + DDDC_pct, data = table2mod)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.206773 -0.056817  0.009617  0.043678  0.144250 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.498e+00  1.584e-01  15.769  < 2e-16 ***
## C_prob       -1.219e+00  3.039e-01  -4.011 0.000136 ***
## memory_depth  4.440e-04  9.681e-05   4.586 1.67e-05 ***
## CC           -4.367e-01  6.764e-02  -6.456 8.04e-09 ***
## CD           -5.978e-02  3.492e-02  -1.712 0.090814 .  
## DD           -4.956e-02  6.253e-02  -0.793 0.430370    
## CC_pct        2.272e+00  2.721e-01   8.349 1.82e-12 ***
## CD_pct        1.338e-01  3.917e-02   3.416 0.001005 ** 
## DC_pct        9.211e-02  4.910e-02   1.876 0.064360 .  
## DD_pct       -5.664e-01  1.145e-01  -4.949 4.14e-06 ***
## CDCD_pct     -6.568e-01  8.880e-02  -7.397 1.29e-10 ***
## CDDD_pct     -2.752e-01  9.759e-02  -2.820 0.006070 ** 
## DCDC_pct      3.654e-01  4.871e-02   7.501 8.10e-11 ***
## DDCD_pct      3.750e-01  2.216e-01   1.692 0.094496 .  
## DDDC_pct     -5.707e-02  5.445e-02  -1.048 0.297748    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07507 on 79 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.9379, Adjusted R-squared:  0.9269 
## F-statistic: 85.26 on 14 and 79 DF,  p-value: < 2.2e-16

This is about the best we can do with a linear model that does not have a ton of variables. One of the larger residuals is AntiCycler, an unusual strategy that does not actively seek to win or score highly. The other is EvolvedLookerUp, the current best strategy in the library (the rightmost dot in the following plots).

The best two models

Let’s take a look at the predictions of our best two models for scores and wins versus the original data:

Looks pretty good, especially for a pair of simple linear models!

Summary

Insights for new strategies

Much of the results would not be surprising to a researcher versed in game theory, however there are a few novel insights:

  • Strategies that use a lot of rounds of history attain higher mean scores than more simple strategies. Although multiple recent researchers have claimed that strategies need not consider more than one round of history, that is clearly not enough. We gain substantial modelling improvements by considering the two round contexts.
  • EvolvedLookerUp uses two rounds of history (and the initial two rounds). It may be valuable to use more initial rounds and include a third round of history, since these elements correlate with score. However it may be that little is gained since there are substantial cross-correlations, and with two rounds mutual cooperation and cyclical defection can already be detected. In other words, longer memories give better results in general but the returns appear to be diminishing, and there is not much to be gained in terms of modelling at the aggregate level (heads up play may leave more room for optimization).
  • In general deterministic strategies perform better. Even zero-determinant strategies are not very effective performers in terms of score, and while they generate a lot of wins, they still are not the best performers.
  • The results do suggest, however, that a two-history version of a zero-determinant strategy may be useful, since the two-round contexts do yield a substantial amount of information beyond the one round contexts.
  • Deadlock breaking of CD and DC context alternation is important, and likely explains Omega TFT’s relative performance in the axelrod library tournaments. Could other strategies benefit from similar behavior?
  • EvolvedLookerUp is highest scoring strategy overall in the library. However it fits in the models fairly well (the rightmost point in the previous plot). Can its successful traits be abstracted and applied to other strategies?
  • Although tournament wins and tournament scores are negatively correlated, some strategies (such as EvolvedLookerUp), manage to score highly and also to net a good deal of wins. Perhaps there is more structure to the relationship between the two beyond a linear tradeoff.

Digging Deeper

We have not really exploited the larger table (Table 1) of pairwise interactions, and there are potentially more interesting relationships to be found. For example, we may be able to model outcomes based on the memory_depths and stochasticity of both players.

Reproducing These Calculations

The python script to generate these data sets is in the AxelrodExamples repository as well as the two tables of data (full matchup data available on request or can be regenerated with the script). I used version v0.0.18 of the axelrod library and rstudio for all the computations.