Capturing process: The promise and challenges of connecting experimental records to the semantic web
The promise of the semantic web in its broadest terms to integrate structured data, and in particular scientific data, is breathtaking. Yet our ability to deliver on that promise remains severely limited. On one side, battles are fought over vocabularies and structured description languages, while on the other computer scientists argue over design and architecture. In the middle the experimental scientist is usually left bemused as to what all the fuss is about and how it can have any relevance to their research, which is naturally far too specialised and ground-breaking to fit into any framework that an outsider could ever provide. The challenge lies in finding a pathway that moves towards satisfying the needs and interests of these disparate groups while providing both value and a usable interface to the end user.
We have adopted a blog based approach to capturing the record of research in an experimental laboratory. The use of a free form framework along with templates encourages the user to structure the description of their work in a way that matches their needs. Cycles of template design and optimisation lead to structures with strong similarities, and occasional important differences, with designed ontologies and vocabularies. The challenge for the future lies in determining how best to guide the actions of users towards the use of appropriate controlled vocabularies and structured descriptions, while using their choices to inform the design of these systems. The potential for Google Wave to provide a framework to enable local structuring as well as integration with global structured descriptions will be explored.
Some interesting ideas, as abused by my memory:
- People tell stories, they don't organise stuff the way it needs to be organised for machines to process. A big part of the disconnect between scientists and the folks building "e-science" tools is that these tools don't help people tell stories.
- So we need to start capturing structure through storytelling. That is, allow people to work in a natural way and pick up the structure from what they do.
- You can create a virtuous circle between metadata, which a user would normally not bother filling in or fill in incorrectly, and user interface. Arrange the UI so that the metadata populates the UI. The user wants that UI to make their life easier, and they start to notice that getting the metadata right is necessary.
- (Lab) instruments as bloggers. Have them push results straight into the blog. Take the human out of the loop (I might qualify, as I think was intended: where they add no value).
Cameron has had early access to Google Wave, it seems, and is impressed with the potential. He had some cool examples of Wave "robots" going off and filling in extra information, drawing graphs, creating links, tidying up references.
He was less impressed by the Google client: too much stuff going on in a single UI, too non-standard. He also expressed concern about users understanding the model. It's the eternal problem with geeks developing stuff for "normal people" again. We're geeks because we do abstract with ease.
There was a lot that I recognise, and some new thoughts as well. I'm all about computational analysis in engineering decision making rather than lab-based science, so there are some differences, but there's a lot of overlap too. In theory we should have an easier time here since so much of our "stuff" comes in the form of information artefacts, and our "experiments" take place in silicon. There are instruments, and the process of getting data from them into our analysis is currently extremely painful, but most of that is "upstream" of our analysis.
I liked the picture of an lab process generating artefacts and narrative, and then a separate process of arranging these things into a different narrative for publication. The story we tell when we want to explain our results to others isn't the story of the trials and tribulations of getting to those results, its a story that gets the reader as smoothly as possible from their current knowledge to where we want them to be. That's true of engineering analysis as of science. And in both cases, we'd be in a much better place if, given that latter story, we could easily investigate the artefacts referred to: look at them, do our own visualisations or further processing, and follow the trail back to see exactly how they were generated.