How Models Fail
A Critical Look at the History of Computer Simulations of the Evolution of Cooperation

Eckhart Arnold

1 Introduction
2 The empirical failure of simulations of the evolution of cooperation
3 Justificatory narratives
4 Bad excuses for bad methods and why they are wrong
5 History repeats itself: Comparison with similar criticisms of naturalistic or scientistic approaches

1 Introduction

Simulation models of the Reiterated Prisoner's Dilemma (in the following: RPD-models) are since 30 years considered as one of the standard tools to study the evolution of cooperation (Rangoni 2013) (Hoffmann 2000). A considerable number of such simulation models has been produced by scientists. Unfortunately, though, none of these models has empirically been verified and there exists no example of empirical research where any of the RPD-models has successfully been employed to a particular instance of cooperation. Surprisingly, this has not kept scientists from continuing to produce simulation models in the same tradition and from writing their own history as a history of success. In a recent simulation study - which does not make use of the RPD but otherwise follows the same pattern of research - Robert Axelrod's (Axelrod 1984) original role model for this kind of simulation studies is praised as “an extremely effective means for investigating the evolution of cooperation” and considered as “widely credited with invigorating that field” (Rendell et al. 2010a, 208-209).

According to a very widespread philosophy of science that is usually associated with the name of Karl Popper (1934) science is distinguished from non-science by its empirical testability and right theories from wrong theories by their actual empirical success. Probably, most scientists in the field of social simulations would even agree to this philosophy of science at least in its general outlines.[1] However, RPD models of the evolution of cooperation have not been empirically successful. So, how come that they are still considered as valuable?

In this paper I am going to examine the question, why the continuous lack of empirical success did not lead the scientists working with these simulation models to reconsider their approach. In the first part I explain what RPD-models of the evolution of cooperation are about. I show that these models failed to produce empirically applicable and tenable results. This will be done by referring to research reports and meta-studies, none of which comes up with an example of successful empirical application.

In the second part of the paper, I highlight a few example cases that show why these models fail. In this context I examine the framing narratives with which scientists justify their method. Such framing narratives form an integral part of any scientific enterprise. My point is not to criticize simulation scientists for employing narratives to justify their method, but I believe that the typical framing narratives that RPD modelers in the tradition of Axelrod employ of are badly founded and I show that in each case there are good arguments against accepting the narrative.

In the third part of this paper I take this analysis one step further by discussing typical arguments with which scientists justify the production and use of unvalidated “theoretical” simulations. Most of the arguments discussed here do usually not form the central topic of scientific papers. Rather, they appear in the less formal communication of scientists, in oral discussions, in small talk, eventually in keynote addresses (Epstein 2008). One may object that if these arguments are never explicitly spelled out, they may not be worth discussing. After all they have never been cast into their strongest imaginable form. Why discuss dinner table talk, anyway? But then, it is often this kind of communication where the deeper convictions of a community are expressed. And it is by no means true that these convictions are without effect on the scientific judgments of the community members. Quite to the contrary, general agreement with the underlying convictions is silently presupposed by one's scientific peers and adherence to them is usually taken for granted by supervisors from their PhD students and often expected by referees from the authors of the papers they review. Therefore, the informal side-talk of science should not at all be exempt from rational criticism.

In the last part of the paper, I relate my criticism to similar discussions in a neighboring (if not overlapping) science, namely political science. It seems that there exist structural similarities in the way scientific schools or research traditions deal with failures of their paradigm. Rather than admitting such a fundamental failure (which, as it touches one's own scientific world view, is obviously much harder than admitting the failure of a particular research enterprise within a paradigm) they retreat by adjusting their goals. In the worst case they become so modest in their achievements (which they, by an equal adjustment of their self-perception, continue to celebrate as successes) that they reach the verge of irrelevance. Green/Shapiro (1994, 44f.) have described this process of clandestine retreat for the case of rational choice theory in political science.

[1] A referee pointed out to me that there is a tension in my paper between the reliance on a Popperian falsificationism and the implicit use of Kuhn's paradigm concept. However, both can be reconciled if the former is understood in a normative and the latter in a descriptive sense. Popper's falsificationism requires, though, that paradigms are not completely incommensurable. But then, there are many good reasons that speak against a strong reading of the incommensurability-thesis, anyway.

Similarly, the Duhem-Quine-thesis does not result in a fatal problem for a Popperian epistemology, if one admits that in most concrete contexts there exist further clues which allow to decide which particular elements of a falsified set of propositions are more likely to be responsible than others. For example, if an experiment falsifies a well established physical theory then it is prima facie more likely that a loose wire was the cause than a failure of the theory. Only after this has been checked carefully one would assign different probabilities to the experimental setup or the theory being false, respectively. See the very enlightening remarks about Kuhn and Duhem-Quine in the case study by Zacharias (2013, 11ff., 305ff.).

t g+ f @