PRONOM: what would success look like?

28 Nov

Today there is a meeting (I think run by the Digital Preservation Coalition) at The National Archives as part of their consultation on PRONOM and DROID. I was invited but unfortunately couldn’t make it (Kew is not the easiest place to get to by 10 am).

I’m personally much more interested in the PRONOM part rather than the DROID elements. I guess that’s because I’m not engaged in running tools to identify file types in ingested material. I know what my file types are; what I want is more information that would be useful in thinking of their preservation. Nevertheless, I suspect my question on PRONOM could be applied as well to DROID: what would success look like?

Part of the answer must depend on who PRONOM is for. To the extent that PRONOM is for The National Archives, it is successful if it meets their needs. If they don’t really need PRONOM (eg compared to DROID) other than as a place to keep file type signatures, then the fact that it is almost entirely comprised of almost-empty entries does not matter to its success (for them).

However, TNA advertise PRONOM as:

“The online registry of technical information. PRONOM is a resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.”

That sounds like it’s a resource for the rest of us. So success would have to mean there are completed entries for most file types (since collectively we will be exposed to almost all file types).

I saw a tweet from Kevin Ashley recently reminding someone that there are around 5,000 types of graphic files alone. There’s probably another dozen genres of file types, although most not so richly populated. So it’s certainly a big job. I think it’s too big for TNA alone to undertake. Indeed, I think it’s too big for any coalition of digital preservation archives to do alone (although this may be a bit more controversial). My belief is that you can only achieve this sort of completeness if you can find a way of crowd-sourcing the information.

I’ve already noted some of the deficiencies in PRONOM, so on 5 August I supplied some information about the .xslx and .docx formats. I got a nice email from someone at TNA on 25 August (a bit outside their 10-day target) that said (amongst toerh things):

“Thanks for this information. We’ll look at including a link to these specifications in the next release. I hope we can provide a better model for including this information in Pronom with the new development.”

As of today, 28 November, the entry for fmt/214 still doesn’t show the information I provided, and the last update is shown as 28 October 2009. I believe there has indeed been a PRONOM release since then.

Personally, I think completeness should be a PRONOM success factor, that completeness is not achievable by TNA alone or even in coalition, that completeness requires the participation of the public, and that the architecture of PRONOM must therefore include a mechanism for crowd-sourced input that works effectively.

That said, I do wish TNA and PRONOM very well.


