Readers may well know, if there’s one thing I really care about, it’s digital preservation. Actually no, delete that. I don’t care about digital preservation AT ALL. If there’s one thing I really care about, it’s trying to ensure that people in the future can read and use important digital objects from their (and our own) past. That’s why I’m really pumped up about Jason Scott’s call that we make November 2012 as “SOLVE THE FILE FORMAT PROBLEM MONTH”. It’s a great post, go read it…
You’re back? Good. Remember, this is a problem that we “digital preservation experts” (and I think I can still just about include myself in that category) have been waffling on about for years. It is a real problem, although I have claimed it is not as bad as we used to think. David Rosenthal has called format obsolescence “the Prostate Cancer of Preservation“, not in relation to any sudden swift deadliness, but rather a widespread nature with in most cases not very severe effects (my half of the human race is much more likely to die with prostate cancer than because of it). And he points out that many of the proposed approaches to format obsolescence risk causing more damage than benign neglect would have done. You read David’s post? Great to see you back again…
There have been various approaches to aspects of this problem. One approach is to try to gather authoritative information about the various file formats in registries; PRONOM is one example of this approach; the proposed Global Digital Format Registry (GDFR) is another. Another (linked) approach is to provide tools to help identify the file type class of a particular file found “in the wild” as it were, based on various clues around and within the file. DROID is one example. Still another approach is to try to ensure that files of a particular format are wellformed by validating them; JHOVE is such a tool.
One problem with these approaches has been the demand for an authoritative view. They are the results of collaborations of insiders. They ignore the vast amount of information held outside the insiders. Was it Bill Joy of Sun who said “most of the smartest people work for somebody else“? Well, most of the information about file formats is known by people other than the insiders and experts. Gamers, hobbyists and the back-bedroom crew fascinated by old stuff kniw far more about many more formats than do the insiders and experts.
That’s what is so refreshing about Jason Scott’s call to arms. He doesn’t care about insiders and experts. I have no idea what Jason Scott thinks about OAIS or the subtleties of representation information, but I’d be willing to bet that it’s way more “expletive deleted” than my own jaundiced view. Get smart people to work together to do stuff, that’s a positive and valuable attitude.
But… what is the file format problem that we need to solve? It could be
- lists of file formats with useful information about them
- file format specifications (and variations)
- tools to identify file formats, given files (cf DROID etc)
- tools to validate file formats (cf JHOVE)
- tools to migrate file formats from obsolescent to more modern forms (also known as “Save As…”)
- tools to emulate older environments to allow obsolescent file formats to be handled
- tools to process obsolescent file formats in current environments (related to “migration on demand”)
- all of the above… gathering all available information about file formats and tools for handling them together.
… or other things I haven’t thought of!
I don’t know quite what Jason has in mind but if he gets even part of that done it’s likely to be something useful. My guess is he’ll be looking for maximum impact rather than maximum polish or maximum “correctness”. I’d like to join in!