An interesting over coffee conversation just had. The context was how one might go about constructing models of geomorphological systems, and how the qualitative descriptions of geomorphologists might be tested and informed by interactive development of quantitative expressions of those descriptions.
Grep tells me that I didn't use the phrase exploratory modelling in my PhD thesis, which I thought I had. Let's post it here.
Google tells me that it's not new. Which is great; people to talk to. Why didn't I run this search before? Zarine Kemp looks to be doing some interesting work, and has some publications I need to follow up.
Biodiversity research is predicated on the sharing of evolving knowledge and data, coupling of models across disciplinary boundaries and networking at the semantic as well as syntactic levels. Recent advances in the extent and sophistication of data capture methods have resulted in the creation of large data repositories. Likewise, the widespread use of the Internet has enabled relatively convenient connectivity to distributed data. Challenges associated with this research include the development of structures and methods to facilitate the semantic and geometric integration of multi-source, multi-scale data and the development of a software architecture to support exploratory modelling processes. Marine biodiversity modelling of the Dover straits has been carried out under a European Union funded project on Biodiversity and Cartography of the Dover Straits. [Zarine Kemp, my emphasis]
The dominant definition of /modelling/ is tied up in the traditional reductionist approach to science. First "understand" the physics, then describe the physics mathematically, then create a numerical approximation to those mathematics, implement that approximation in computer code, and run simulations of particular physical systems using that implementation.
Exploratory modelling sets out with a different world view. Exploratory modelling assumes from the outset a complex, open physical system. It acknowledges that our understanding of that system can only ever be partial, and that our models of that system must make approximations in both the detail to which subsystems are represented, and the completeness with which all the various interacting parts of the system can be taken into account. It views the properties of the system to be reproduced as emergent properties of lower level processes, but does not blindly pursue those finer process, for that way leads a downward spiral into computational intractability, futility, and madness; instead it decides which level of emergence to treat as basic, and judges the quality of that representation on how well higher level emergent features are replicated. It accepts that there are significant aspects of system behaviour — depending on the purpose, possibly the most significant aspects — which are not quantifiable by some "objective" function; that the human eye (or rather brain) is an excellent tool for recognising patterns. It views the computer as a prosthesis for the mind, not a device which drives the level of discourse of the human "operator" relentlessly down to that of a machine.
After all, how much time can a human spend converting data between file formats before going postal?
This different world view is conditioned by a different purpose. The purpose of modelling is the development of understanding by the interactive refinement of a model. The model is the apparatus for hypothesis testing. Hypotheses start vague, and get more detailed as the process of generation and testing proceeds. Sub-hypotheses emerge, are rejected or tentatively accepted, added to or further subdivided. Of course this process is never finished, but at each stage some more understanding emerges about the behaviour of the system.
This idea — of model development as hypothesis generation and testing — is given lip service often enough, but the implications of this are rarely explored. Which is why the paragraph from Zarine Kemp caught my eye: explicit acknowledgement that software architecture is important. Well designed software must enable without getting in the way. I've talked about this before.
How can one really explore and test competing hypotheses when the hypothesis being tested is totally obscured in code at the level of C or FORTRAN instructions? When two process, considered to be independent but interacting, are mixed up in the code, all embedded within an outer loop (time). Great hopes have been placed on object orientation in this regard, but object orientation ultimately provides a new way of arranging instructions at the same level. Worse, it entrenches a focus on state where in environmental modelling process is king.
[Aside: I started going on about this issue of the sub-optimality of representing processes in an object oriented language here, but it's a distraction. I've trimmed it and stored it as a draft to be explored later.]
A knowledge representation is a set of ontological commitments. [Randall Davis, Howard Shrobe, Peter Szolovits, What is a Knowledge Representation?]
This is becoming something of a mantra for me. A modelling tool (a tool for constructing models) is a knowledge representation tool. Every modelling tool embeds a meta-model, a model of models, a view of what constitutes a model. This view may be implicit, it may be ill-defined, it may be internally inconsistent, or all of the above, but it is there.
The commitments are in effect a strong pair of glasses that determine what we can see, bringing some part of the world into sharp focus, at the expense of blurring other parts. [Randall Davis, Howard Shrobe, Peter Szolovits, What is a Knowledge Representation?]
That meta-model, that model of models, can hinder the modeller in their exploration of the model space (or, in Keith Beven's language, the landscape to model space mapping), or it can help. It can help in two distinct ways; it can match the modeller's existing view of models, or it can influence the modeller's view of models in some way.
Again exercising restraint, and avoiding the temptation to pontificate on the specifics of possible appropriate meta-models, let's consider some important features of such a meta-model which might support the sort of exploratory modelling, the ongoing parallel refinement of hypothesis and implementation-as-apparatus, which is desirable. This list is in approximate order of implication, and is not intended to be complete (suggestions for additions greatly welcome).
Transparency. It must be possible for the modeller to understand the model by exploring it. It should be (as near to) immediately (as possible) clear how a process at a given level is defined in terms of subprocesses.
Locality. It should be possible to describe a process as an independent "piece", a chunk which can be stored, sent by email, and so on. Such a chunk can rely on nothing outside of itself (other than basic framework services: the bus.
Pluggability. It should be possible to take such a piece, and connect it with other such pieces, thus forming a combined whole which itself describes a process; the interacting union of its subprocesses.
Self description. Each such piece must carry with it a full description of its interface with the outside world (of which it makes a minimum of assumptions, see locality). In a general modelling tool, we have no a priori knowledge of the sorts of models which will be included or of how they will be combined. We must be able to receive a piece in an email attachment, and immediately connect instances of it with other pieces in the modelling tool. This ability is sometimes known as reflection, the ability of a piece of software to examine itself at run time.
Hierarchy. Hierarchical composition is a standard technique for managing (intrinsic) complexity. A set of interacting subprocesses itself describes a process, and should be possible to treat it as a single process for further composition. This implies a holonic architecture [Arthur Koestler, The Ghost in the Machine; the reviews of this book suggest strongly that I have never read it to its conclusion, I'd better take it out of the library again and do so], in which each "whole" can also be a part in a larger "whole".
There's more to this than software architecture, of course. User interface for an obvious start. I had a conversation with Jon the other day, in which he asked about developing a graphical interface for the prototype Model Description Framework (I think I started calling it the Model Definition Framework; I think I prefer Description). I had dismissed this as, at this stage, a frivolous distraction, and Jon pointed out the marketing value of it. This needs some more thought, but I remain worried that if people get excited by a demonstration of a graphical interface, they may well be excited by entirely the wrong thing, especially with an interface which has been knocked together as a demonstration.
Achieving a true exploratory modelling environment requires, I believe, some considerable research into effective user interfaces, taking account of the meta-model expressed in the architecture and the sort of interactions with the developing model which need to be supported. I believe it was in a paper entitled Hydroinformatics at Delft Hydraulics [van Zuylen et al.] I came across the idea that all software is embedded; it is embedded in the human–computer system. Introducing a new piece of software modifies the system, and thus modifies our understanding of what is needed, thus software must be designed in an iterative cycle taking account of users' behaviour when using the software.
Well, this quick record of some ideas developed over coffee has taken me to seven o'clock. Now is not the time to move on to related thoughts, but at some point I'd like to exteranlise some thoughts about
TODO: The sub-optimality of representing processes in an object oriented language.
TODO: More detail on the sorts of architecture which fulfil the requirements presented here.
Recent Comments