« E-paper, and real e-books | Main | Bootsrapping the semantic community »

Filling in metadata

Earl Mardle comments on David Weinberger's blogging of the "TTI Vanguard conference" (I can't see where it's called that on the linked page, looks like "Knowledge Management Comes of Age" to me).

Sez Weinberger:

Bobby Kishore of Microsoft is explaining how to create a KM system without relying on explicit metadata. People don't like filling out forms and entering metadata explicitly. So, a KM system ought to mine content for metadata.

One aspect of the environmental modelling tool vision that Jim Hall and I have been developing in conversations over the last few months has two facets.

  1. We need to store our data and definitions of our models in such a way that structured metadata can be attached, and however we do this, we must do it in such a way that the metadata schema is open ended. I don't yet know what metadata will prove necessary next year. This is another of the reasons behind my switch to RDF.
  2. We need to build into our model development and exploration tools mechanisms for the harvesting of valuable information. The discussions have been mostly about collecting information on the sensitivity of models to parameters, for example, where the details of such studies, which tend to use very many realisations (realisation = model + particular parameter set). There are other, more mundane things to collect though. Who modified a model, and when? What processing has been applied to data, and where is the raw data it was applied to?

People in the hydroinformatics world (and beyond it in environmental modelling fields generally I am sure) have been talking about needing this sort of information, but I haven't heard anyone talk about practical systems which make its accumulation transparent. Any system which does not, is not practical.

Weinberger:

That's certainly right. But here's a related question: Why don't people enter metadata? It's not simply because it's a pain in the butt: individual pain for organizational benefit. It's also because filling in metadata makes us pull back from the world, an attitude that goes against our biology. In fact, it's desire itself that draws us into the world and makes us shudder as we draw back from it.

This is very, very true with modelling. When exploring possibilities, options, one works in a tight loop with the model, modifying this, tweaking that, watching various measures of performance some of which will be "objective" (repeatable) and some of which will not. Stopping to annotate the results breaks the flow, and won't get done.

There's another aspect to this than the need to mine for metadata, and not ask people to behave like machines. It's bad enough to ask people to be as pedantic and structured as a machine needs to be in order that the machine can "mitkommen" (follow, keep up). It's far worse to ask people to do that same bit of work twice, three times, many times over. Consider this: I receive emails, dozens of them a month, from journal publishers. These emails contain the tables of contents of the latest journal issues. one paper in three emails, at a rough guess, excites enough interest for me to follow the link and have a look at the paper. There are others which I would like to squirrel away the reference too, because I have a feeling I'll want to read it later.

I also have a bibliographic database, in which I have, in structured form, information about all the references which were listed in the bibliography of my PhD thesis. But very little else.

That database would be the obvious place to store information about papers, whether I also printed them out to read or not. But in order to do that, I have to cut and paste from the email (and probably the web page, there is likely not all of the necessary info in the email) into the bibliographic database management tool. So I don't do it. It takes too long, it makes my wrists hurt, and I have to disconnect from the world and commune at an absurdly low level with a machine to do it, so I don't do it.

There's a hell of a lot of structured metadata out there; but it's all carefully filtered through structure-removing mesh before it's let loose into the world. Flat web pages, flat emails, even flat RSS channels. There is occasional cause for hope. Elsevier are working with RSS 1.0, adding a module for PRISM which will allow the syndication of full tables of contents using RSS 1.0. Lets hope the Journal publishers see the value, and soon.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345588d069e200d83537b46569e2

Listed below are links to weblogs that reference Filling in metadata:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

March 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31