What's wrong with social simulations?

Eckhart Arnold

1 Introduction
2 Simulation without validation in agent-based models
3 How a model works that works: Schelling’s neighborhood segregation model
4 How models fail: The Reiterated Prisoner’s Dilemma model
5 An ideology of modeling
6 Conclusions

2 Simulation without validation in agent-based models

In this section I give my interpretation of a survey by Heath et al. (2009) on agent-based-simulations. I do so with the intention of substantiating my claim that many social simulations are indeed useless. This is neither the aim nor the precise conclusion that Heath et al. (2009) draw, but their study does reveal that two thirds of the surveyed simulation studies are not completely validated and the authors of the study consider this state of affairs as “not acceptable” (Heath et al. 2009, 4.11). Thus my reading does not run counter the results of the survey. And it follows as a natural conclusion, if one accepts that a) an unvalidated simulation is - in most of the cases - a useless one and b) agent-based simulations make up a substantial part of social simulations.

The survey by Heath et al. (2009) examines agent-based mode- ling practices between 1998 and 2008. It encompasses “279 articles from 92 unique publication outlets in which the authors had constructed and analyzed an agent-based model” (Heath, Hill and Ciarallo, 2009, abstract). The articles stem from different fields of the social sciences including, business, economics, public policy, social science, traffic, military and also biology. The authors are not only interested in verification and validation practices, but the results concerning these are the results that I am interested in here. Verification and validation concern two separate aspects of securing the correctness of a simulation model. Verification, as the term is used in the social simualtions community, roughly concerns the question whether the simulation software is bug-free and correctly implements the intended simulation model. Validation concerns the question whether the simulation model represents the simulated empirical target system adequately (for the intended purpose).

Regarding verification, Heath, Hill and Ciarallo notice that “Only 44 (15.8%) of the articles surveyed gave a reference for the reader to access or replicate the model. This indicates that the majority of the authors, publication outlets and reviewers did not deem it necessary to allow independent access to the models. This trend appears consistently over the last 10 years” (Heath et al. 2009, 3.6). This astonishingly low figure can in part be explained by the fact that as long as the model is described with sufficient detail in the paper, it can also be replicated by re-programming it from the model description. It must not be forgotten that the replication of computer simulation results does not have the same epistemological importance as the replication of experimental results. While the replication of experiments adds additional inductive support to the experimental results, the replication of simulation results is merely a means for checking the simulation software for programming errors (“bugs”). Hence the possibility of precise replication is not an advantage that simulations enjoy over material experiments, as for example Reiss (2011, 248) argues. Obviously, if the same simulation software is run in the same system environment the same results will be produced, no matter whether this is done by a different team of researchers at a different time and place with different computers. Even if the model is re-implemented the results must necessarily be the same provided that both the model and the system environment are fully specified and no programming errors have been made in the original implementation or the re-implementation.[1] Replication or reimplementation can, however, help to reveal such errors.[2] It can therefore be considered as one of several possible means for the verification (but not validation) of a computer simulation. Error detection becomes much more laborious if no reference to the source code is provided. And it does happen that simulation models are not specified with sufficient detail to replicate them (Will/Hegselmann 2008). Therefore, the rather low proportion of articles that provide a reference to access or replicate the simulation is worrisome.

More important than the results concerning verification is what Heath, Hill and Ciarallo find out about validation or, rather, the lack of validation:

Without validation a model cannot be said to be representative of anything real. However, 65% of the surveyed articles were not completely validated. This is a practice that is not acceptable in other sciences and should no longer be acceptable in ABM practice and in publications associated with ABM. (Heath et al. 2009, 4.11)

This conclusion needs a little further commentary. The figure of 65% of not completely validated simulations is an average value over the whole period of study. In the earlier years that are covered by the survey hardly any simulation was completely validated. Later this figure decreases, but a ratio of less than 45% of completely validated simulation studies remains constant during the last 4 yours of the period covered (Heath et al. 2009, 3.10).

Furthermore it needs to be qualified what Heath, Hill and Ciarallo mean when they speak of complete validation. The authors make a distinction between conceptual validation and operational validation. Conceptual validation concerns the question whether the mechanisms built into the model represent the mechanisms that drive the modeled real system. An “invalid conceptual model indicates the model may not be an appropriate representation of reality.” Operational validation then “validates results of the simulation against results from the real system.” (Heath et al. 2009, 2.13). The demand for complete validation is well motivated: “If a model is only conceptually validated, then it [is] unknown if that model will produce correct output results.” (Heath et al. 2009, 4.12). For even if the driving mechanisms of the real system are represented in the model, it remains – without operational validation – unclear whether the representation is good enough to produce correct output results. On the other hand, a model that has been operationally validated only, may be based on a false or unrealistic mechanism and thus fail to explain the simulated phenomenon, even if the data matches. Heath, Hill and Ciarallo do not go into much detail concerning how exactly conceptual and operational validation are done in practice and under what conditions a validation attempt is to be considered as successful or as a failure.

But do really all simulations need to be validated both conceptually and operationally as Heath, Hill and Ciarallo demand? After all, some simulations may – just like thought experiments – have been intended to merely prove conceptual possibilities. One would usually not demand an empirical (i.e. operational) validation from a thought experiment. Heath, Hill and Ciarallo themselves make a distinction between the generator, mediator and predictor role of a simulation (Heath et al. 2009, 2.16). In the generator role simulations are merely meant to generate hypotheses. Simulations in the mediator role “capture certain behaviors of the system and [..] characterize how the system may behave under certain scenarios” (3.4) and only simulations in the predictor role are actually calculating a real system. All of the surveyed studies fall into the first two categories. Obviously, the authors require complete validation even from these types of simulations.

This can be disputed. As stated in the introduction, in order to be useful, a simulation study should make a contribution to answering some relevant question of empirical science. This contribution can be direct or indirect. The contribution is direct if the model can be applied to some empirical process and if it can be tested empirically whether the model is correct. The model’s contribution is indirect, if the model cannot be applied empirically, but if we can learn something from the model which helps us to answer an empirical question, the answer to which we would not have known otherwise. The latter kind of simulations can be said to function as thought experiments. It would be asking too much to demand complete empirical validation from a thought experiment.

But does this mean that the figures from Heath, Hill and Ciarallo concerning the validation of simulations need to be interpreted differently by taking into account that some simulations may not require complete validation in the first place? This objection would miss the point, because the scenario just discussed is the exception rather than the rule. Classical thought experiments like Schrödinger’s cat usually touch upon important theoretical disputes. However, as will become apparent from the discussion of simulations of the evolution of cooperation, below, computer simulation studies all too easily lose the contact to relevant scientific questions. We just do not need all those digital thought experiments on conceivable variants of one and the same game theoretical model of cooperation. And the same surely applies to many other traditions of social modeling as well. But if this is true, then the figure of 65% of not completely validated simulation studies in the field of agent-based simulations is alarming indeed.[3]

Given how important empirical validation is, “because it is the only means that provides some evidence that a model can be used for a particular purpose.” (Heath et al. 2009, 4.11), it is surprising how little discussion this important topic finds in the textbook literature on social simulations. Gilbert/Troitzsch (2005) mention validation as an important part of the activity of conducting computer simulations in the social sciences, but then they dedicate only a few pages to it (22-25). Salamon (2011, 98) also mentions it as an important question without giving any satisfactory answer to this question and without providing readers with so much as a hint concerning how simulations must be constructed so that their validity can be empirically tested. Railsback/Grimm (2011) dedicate many pages to describing the ODD-protocol, a protocol that is meant to standardize agent-based simulations and thus to facilitate the construction, comparison and evaluation of agent-based simulations. Arguably the most important topic, empirical validation of agent-based simulations, is not an explicit part of this protocol. One could argue that this is simply a different matter, but then, given the importance of this topic it is slightly disappointing that Railsback and Grimm do not treat it more explicitly in their book.

Summing it up, the survey by Heath, Hill and Ciarallo shows that an increasingly important sub-discipline of social simulations, namely the field of agent-based simulations faces the serious problem that a large part of its scientific literature consists of unvalidated and therefore most probably useless computer simulations. Moreover, considering the textbook literature on agent-based simulations one can get the impression that the scientific community is not at all sufficiently aware of this problem.

[1] A possible exception concerns the frequent use of random numbers. As long as only pseudo random numbers with the same random number generator and the same “seed” are used, the simulation is still completely deterministic. This not to say that sticking to the same “seeds” is good practice other than for debugging.

[2] I am indebted to Paul Humphreys for pointing this out to me.

[3] For a detailed discussion of the cases in which even unvalidated simulations can be considered as useful, see Arnold (2013). There are such cases, but the conditions under which this is possible appear to be quite restrictive.

t g+ f @