The main text of this blog post is a letter I have sent to Tony Hey of Microsoft, asking him to use his influence to get specifications for older obsolete file formats published on Microsoft’s Open Specifications Page. If you support this, please leave me a comment below endorsing the letter (note, the spam filter may delete or refer for moderation any comments containing URLs).
Open Letter on specifications for obsolete file formats
I am writing to you, as the most senior person I know in Microsoft, to ask you to use your influence to ensure Microsoft adds specifications for older Office (and other) file formats to the Microsoft Open Specifications page. I have put this Open Letter on my blog (http://unsustainableideas.wordpress.com/), and (if you agree) would like to put any reply from you on that blog as well. I will also solicit further support for this letter on that blog, in the form of comments of endorsement.
Microsoft’s Open Specifications page and the accompanying Open Specification Promise were both very welcome developments, for which Microsoft is rightly applauded. However, the Specifications only go back to Office 97-2003 formats. I have some MS Word and MS Excel documents from earlier versions of Office that seem to open well in more modern software, so perhaps their file formats are compatible, at least to some extent. However, PowerPoint 4.0 files do not open at all in modern MS Office applications, and the file format is understood to be very different.
I have been attempting to convert some 50 or so PowerPoint 4.0 files to more modern formats (to migrate them, in digital preservation parlance), and have documented the process in a series of posts on my blog. The post at http://unsustainableideas.wordpress.com/2012/10/02/powerpoint-4-0-story-so-far/ sums up the exercise, and there is one further post about a small company that has succeeded in converting some files for me. At present there appear to be only two routes for migration: one relies on technology preservation (or emulation) in the form of systems that can (and are licensed to) run a sufficiently early version of MS PowerPoint, and the second is via this small company, Zamzar. Neither of these solutions can be relied on for the long term.
While my main focus in this letter is on older formats within the basic Office set, the specifications for related software such as Microsoft Works and early versions of Microsoft Access would also be helpful for preservation purposes.
You might ask: why should Microsoft put effort today into making these specifications available? I believe Microsoft’s software tools are not merely temporary mechanisms for profit in the marketplace, but (by dint of their flexibility and success) tools that the wider world has used to create billions of cultural artefacts that may be of lasting value. By declining to help make these obsolete file formats accessible, Microsoft is locking up this cultural content, and will eventually throw away the key.
Andrew Jackson of the British Library (who helped me with my initial attempts to convert my PowerPoint 4.0 files) has studied the population of older file formats in a dataset of 2.5B web resources from the UK Web Archive. He found that PowerPoint 4.0 has been persisting on the UK web until fairly recently. For ALL PowerPoint files with identifiable versions created from 1996 to 2010, PowerPoint 4.0 and PowerPoint 95 represent around 2.5%, and for PowerPoints created up to 2002 the proportion of the older formats was 27%. We can be confident that many, many more such resources will exist in private file systems.
Why should Microsoft act now? First, because the number of people within Microsoft who understand these formats must be declining. Second, the specifications themselves (to the extent that they exist as simple documents) must also be at risk of loss through accident or some grand tidy-up process that discounts older material as irrelevant. Third, because many of the early adopters who used these products in the 1990s are, like me, coming up to or past retirement. I believe there will be an increasing swell of documents from some of these people flowing into archives for preservation over the next several years. Many of these will be documents from people of much greater cultural and scientific importance than me, but who have less time and/or ability to pursue possible solutions to an obsolescence problem. Fourth, I think this is consistent with the direction you have helped Microsoft to take since joining them.
I’m also motivated by another factor: Jason Scott’s call for action to “Solve the File Format Problem” scheduled for this November (original post here and wiki page here http://www.archiveteam.org/index.php?title=Just_Solve_the_Problem_2012). Jason is a member of the Archive Team of “rogue archivists”, who attempts to save disappearing web sites, and is seeking a crowd-sourced solution to the lack of information on obsolete file formats. It would be wonderful if Microsoft could add to that information by making these specifications available in November.
What would this cost Microsoft? On the face of it, simply the staff effort to gather the relevant specifications and make them available. Of course, the documents may not exist as well-written specifications, in which case I would urge Microsoft to make as much information available as possible, allowing others to make sense of them against the ”ground truth” of existing files. It would be wonderful if Microsoft could make available a migration tool, but this would obviously be a larger effort wth longer term implications. Indeed, in the long run it might be more cost effective to support an open migration tool.
The benefit to Microsoft in doing this would be in enhancing its reputation as a responsible company that understands and acts on the implications of its past work.
Possible outcomes could include input filters for open software such as OpenOffice or Libre Office, input converters for SlideShare and others, and possible Microsoft or commercial 3rd party migration tools.
The societal benefits of this would include better preservation of a subset of cultural artefacts, a better understanding of the content of presentations in early days, which may document discoveries or encapsulate persuasion arguments for significant change programs. Ultimately, this is about a richer cultural heritage. My own presentations in PowerPoint 4.0 date from the time when I was Director of the JISC Electronic Libraries Programme, and document how we sought to persuade the community to go forward with that campaign, and some of the adjustments that were made to it.
I have found a previous Open Letter on a similar subject, from Rick Jelliffe on the O’Reilly XML blog, at http://www.oreillynet.com/xml/blog/2008/03/an_open_letter_to_microsoft_ib.html.
I would really appreciate your views on whether this might be possible.