Validation of Computer Simulations from a Kuhnian Perspective

von Eckhart Arnold

1 Introduction
2 Kuhn's philosophy of science
3 A revolution, but not a Kuhnian revolution: Computer simulations in science
4 Validation of Simulations from a Kuhnian perspective
    4.1 Do computer simulations require a new paradigm of validation?
    4.2 Validation of simulations and the Duhem-Quine-thesis
    4.3 Validation of social simulations
5 Summary and Conclusions

4.2 Validation of simulations and the Duhem-Quine-thesis

Another point frequently emphasized in the philosophy of simulation literature is that computer simulations can become highly complex. This is also one of the major differences between computer simulations and thought experiments, to which they are otherwise quite similar. At least in the natural sciences computer simulations can often be based on comprehensive and well tested theories, such as quantum mechanics, general relativity, Newton's of gravitation or - in engineering - the method of finite elements. But even in the natural sciences simulations cannot always be based on a single theory, but they sometimes rely on different theories from different origins. Climate simulations are a well-known example for this. And even where simulations are based on a single theory, they usually also draw on various sorts of approximations, local models and computational techniques. None of these can be derived from theory, so that they need independent credentials. This situation has been described in the philosophy of simulation literature as their being motley and partly autonomous (Winsberg 2003). This description echos a recent trend in the philosophy of science which emphasizes the importance and relative independence of models from theory (Morgan/Morrison 1999, Cartwright 1983).

So, if simulations are knit together from many independent set pieces of theories, models, approximations, algorithmic optimizations etc., then the Duhem-Quine-thesis could point out a potential problem. A possible reading of the thesis assumes that if validation fails (for example, because an empirical prediction was made that turned out to be wrong), then one cannot know which part of the chain of theoretical reasoning failed that leads to the empirical prediction. In the case of computer simulations this means that one does not know whether the theory on which the simulation is based, the simplifications that may have been made in the course of modeling or, finally, the program code has failed.

By the same token, if this reading of Duhem-Quine is accurate, simulation scientists would - for better or worse - enjoy a great freedom of choice concerning where to make adjustments if a simulation fails, i.e. if it leads to unexpected, obviously false or no results at all. Some philosophers have even argued that scientists sometimes deliberately employ assumptions that are known to be false to make their simulations work. Among these are artificial viscosity (Winsberg 2015, sec. 8), or - another often cited example - “Arakawa's trick” (Lenhard 2007). Arakawa based a general circulation model of the world climate on physically false assumptions to make it work, which by the scientific community was accepted as a technical trick of trade.

However, this reading of Duhem-Quine paints a somewhat unrealistic picture of scientific practice, because in case of failure there usually exist further contextual cues where the error causing the failure has most likely occurred. While in the abstract formal representation of theories that is sometimes used to explain Duhem-Quine, the premises are represented as propositions with no further information, scientists usually have good reasons to consider the failure of some premises as more likely than others. In science and engineering, the premises are usually ordered in a hierarchy that starts with the fundamental physical, chemical or biological theories, ranges over various steps of system description and approximation down to the computer algorithms and, ultimately, the programm code. If a simulation fails one would start to examine the premises in backward order. And this is only reasonable, because prima facie, it is more likely that your own program code contains a bug than, say, that the theory of quantum mechanics is false or that some of the tried and tested approximation-techniques are wrong. Though, of course, this is not completely out of the question, too.[6] It should be understood that the credibility of the various premises occurring in this hierarchy does not follow their generality, but depends on their respective track record of successful applications in the past. It can safely be assumed that this situation is typical for normal science.[7]

It must be conceded, though, that during a scientific revolution or within cross-paradigm-discourse, there might be no hierarchy of premises to rely on, because some of the premises higher up in the hierarchy, like the fundamental theories, are not generally accepted any more. In this situation, there might, as Kuhn suggested, only be vague meta-principles left to rely on and we must face the possibility of not being able to resolve all conflicts of scientific opinion.

What about the conscious falsifications like artificial viscosity and “Arakawa's trick” that - according to some philosophers of science - are introduced by simulations scientists in order to make their simulations work? This reading has not gone unchallenged, and it has been called in to question whether the artificial viscosity that Winsberg mentions is more than just another harmless approximation (Peschard 2011b) or whether “Arakawa's trick” not merely compensates for errors made at another place, which would make it an example of a simulation the success of which is badly understood rather than one that is very representative of simulation-based science (Beisbart 2011, 333f.). It seems that these philosophically certainly interesting examples concern exceptions rather than what is the rule in the scientific practice with simulations. For the time being that is to say, because it is well imaginable that in the future development of science these tricks become more common.

Summing it up, with respect to the Duhem-Quine-thesis there are neither additional challenges nor additional chances for the validation of simulations. Under normal science-conditions it does not play a role at all. Other than that it merely reflects the greater methodological imponderabilities during a revolutionary phase or in an inter-paradigm context.

[6] See Arnold/Kaestner (2013, sec. 3.4) for a case-study containing a detailed description of this hierarchy of premises.

[7] But see Lenhard (2019) in chapter 39 in this book, who paints a very different picture. I cannot resolve the differences here. In part they are due to Lenhard using examples where “ 'due to interactivity, modularity does not break down a complex system into separately manageable pieces.' ” To me it seems that as far as software design goes, it is always possible - and in fact good practice - to design the system in such a way that each unit can be tested separately. As far as validation goes, I admit that this may not work as easily because of restrictions concerning the availability of empirical data.

t g+ f @