How Models Fail
A Critical Look at the History of Computer Simulations of the Evolution of Cooperation

Eckhart Arnold

1 Introduction
2 The empirical failure of simulations of the evolution of cooperation
    2.1 Axelrod's “The Evolution of Cooperation”
    2.2 The empirical failure of the RPD-model
3 Justificatory narratives
4 Bad excuses for bad methods and why they are wrong
5 History repeats itself: Comparison with similar criticisms of naturalistic or scientistic approaches

2 The empirical failure of simulations of the evolution of cooperation

2.1 Axelrod's “The Evolution of Cooperation”

One of the most important initiators of the research on the RPD-model was Robert Axelrod. The publication of his book “The Evolution of Cooperation” popularized the simulation approach to studying the evolution of cooperation. At the core of Axelrod's simulation lies the two person's Prisoner's Dilemma game. The two person's Prisoner's Dilemma is a game, where two players are asked to contribute to the production of a public good. Each player can choose to either contribute, that is, to cooperate, or not to contribute, that is to defect. If both players cooperate they both receive a reasonably high payoff. If neither player cooperates, they both receive a low payoff. If one player tries to cooperate while the other player defects, the player who tried to cooperate receives a zero payoff, while the successful cheater receives the highest possible payoff in the game, which at the same time is more than the cooperative payoff. Since, no matter what the other player does, it is always more advantageous for each individual player not to cooperate. Therefore, both players, if they are rational egoists, end up with the low non-cooperation payoff - at least as long as the game is not repeated. The reiterated Prisoner's Dilemma (RPD), in which the same players play through a sequence of Prisoner's Dilemmas, changes the situation, because defecting players can be punished with non-cooperation in the following rounds.

It can be shown that in the reiterated Prisoner's Dilemma there is no single best strategy. In order to find out if there exist certain strategies that are by and large more successful than other strategies and whether there are certain characteristics that successful strategies share, Robert Axelrod (1984) conducted a computer tournament with different strategies. Axelrod also fed the results of the tournament simulation into a population dynamical simulation, where more successful strategies would gradually out-compete less successful strategies in a quasi-evolutionary race. Famously, TIT FOR TAT emerged as the winner in Axelrod's tournament.[2]

The way Axelrod employed his model as a research tool was by running simulations and then generalizing from the results he obtained. These included recommendations such as that TIT FOR TAT usually is good choice for a strategy or that a strategy should not defect unmotivated itself, but should punish defections and should also be forgiving etc. As subsequent research revealed, however, almost none of these conclusions was in fact generalizable (see Arnold (2013b, 106ff., 126f.) with further references). For each of them there exist variations of the RPD-model where it does not hold and where following Axelrod's recommendations could be a bad mistake. The only exception is Axelrod's result about the collective stability of TIT FOR TAT, which he proved mathematically.

The central flaw of Axelrod's research design is that it relies strongly on impressionistic conclusions and inductive generalizations from what are in fact contingent simulation results. This deficiency of Axelrod's model has convincingly been criticized by Binmore (1998, 313ff.). To give just one example: In Axelrod's tournament TIT FOR TAT won in two subsequent rounds. Axelrod concluded that TIT FOR TAT is a good strategy and that it is advisable to be forgiving. However, if one chooses the set of all 2-state automata as strategy set - which is a reasonable choice because it contains all strategies up to a certain complexity level - then the unforgiving strategy GRIM emerges as the winner (Binmore 1994, 295ff.).

Axelrod's followers would usually be much more cautious about drawing general conclusions from simulations, but they did not completely refrain from generalizing. In the ensuing research a historical pattern emerged where researchers would pick up existing models, investigate variants of these models, and eventually demonstrate that the previous results could not be generalized (Schüßler's and Arnold's simulations, which are discussed below, are examples for this pattern). Thus, Axelrod's research design became – despite its great deficiencies – a role model for simulation studies until today. As a justification for publishing yet another model, it would usually suffice to relate to the previous research. No reference to empirical research or just empirical applicability would be considered necessary. For example, Rangoni (2013) introduces his study of a variant of the RPD-model by mentioning that “Axelrod’s work on the prisoner’s dilemma is one of the most discussed models of social cooperation” and declaring that “After more than thirty years from the publication of its early results, Axelrod’s prisoner’s dilemma tournament remains a cornerstone of evolutionary explanation of social cooperation”, although - as will be discussed in the following - this “cornerstone of evolutionary explanation” has not been confirmed empirically in a single instance.

[2] More detailed descriptions of the RPD-model and Axelrod's tournament can be found in Axelrod (1984), Binmore, Binmore (1994, 1998) or Arnold (2008).

t g+ f @