Open letter to Microsoft on specs for obsolete file formats

22 Oct

The main text of this blog post is a letter I have sent to Tony Hey of Microsoft, asking him to use his influence to get specifications for older obsolete file formats published on Microsoft’s Open Specifications Page. If you support this, please leave me a comment below endorsing the letter (note, the spam filter may delete or refer for moderation any comments containing URLs).

Dear Tony,

Open Letter on specifications for obsolete file formats

I am writing to you, as the most senior person I know in Microsoft, to ask you to use your influence to ensure Microsoft adds specifications for older Office (and other) file formats to the Microsoft Open Specifications page. I have put this Open Letter on my blog (https://unsustainableideas.wordpress.com/), and (if you agree) would like to put any reply from you on that blog as well. I will also solicit further support for this letter on that blog, in the form of comments of endorsement.

Microsoft’s Open Specifications page and the accompanying Open Specification Promise were both very welcome developments, for which Microsoft is rightly applauded. However, the Specifications only go back to Office 97-2003 formats. I have some MS Word and MS Excel documents from earlier versions of Office that seem to open well in more modern software, so perhaps their file formats are compatible, at least to some extent. However, PowerPoint 4.0 files do not open at all in modern MS Office applications, and the file format is understood to be very different.

I have been attempting to convert some 50 or so PowerPoint 4.0 files to more modern formats (to migrate them, in digital preservation parlance), and have documented the process in a series of posts on my blog. The post at https://unsustainableideas.wordpress.com/2012/10/02/powerpoint-4-0-story-so-far/ sums up the exercise, and there is one further post about a small company that has succeeded in converting some files for me. At present there appear to be only two routes for migration: one relies on technology preservation (or emulation) in the form of systems that can (and are licensed to) run a sufficiently early version of MS PowerPoint, and the second is via this small company, Zamzar. Neither of these solutions can be relied on for the long term.

While my main focus in this letter is on older formats within the basic Office set, the specifications for related software such as Microsoft Works and early versions of Microsoft Access would also be helpful for preservation purposes.

You might ask: why should Microsoft put effort today into making these specifications available? I believe Microsoft’s software tools are not merely temporary mechanisms for profit in the marketplace, but (by dint of their flexibility and success) tools that the wider world has used to create billions of cultural artefacts that may be of lasting value. By declining to help make these obsolete file formats accessible, Microsoft is locking up this cultural content, and will eventually throw away the key.

Andrew Jackson of the British Library (who helped me with my initial attempts to convert my PowerPoint 4.0 files) has studied the population of older file formats in a dataset of 2.5B web resources from the UK Web Archive. He found that PowerPoint 4.0 has been persisting on the UK web until fairly recently. For ALL PowerPoint files with identifiable versions created from 1996 to 2010, PowerPoint 4.0 and PowerPoint 95 represent around 2.5%, and for PowerPoints created up to 2002 the proportion of the older formats was 27%. We can be confident that many, many more such resources will exist in private file systems.

Why should Microsoft act now? First, because the number of people within Microsoft who understand these formats must be declining. Second, the specifications themselves (to the extent that they exist as simple documents) must also be at risk of loss through accident or some grand tidy-up process that discounts older material as irrelevant. Third, because many of the early adopters who used these products in the 1990s are, like me, coming up to or past retirement. I believe there will be an increasing swell of documents from some of these people flowing into archives for preservation over the next several years. Many of these will be documents from people of much greater cultural and scientific importance than me, but who have less time and/or ability to pursue possible solutions to an obsolescence problem. Fourth, I think this is consistent with the direction you have helped Microsoft to take since joining them.

I’m also motivated by another factor: Jason Scott’s call for action to “Solve the File Format Problem” scheduled for this November (original post here and wiki page here http://www.archiveteam.org/index.php?title=Just_Solve_the_Problem_2012).  Jason is a member of the Archive Team of “rogue archivists”, who attempts to save disappearing web sites, and is seeking a crowd-sourced solution to the lack of information on obsolete file formats. It would be wonderful if Microsoft could add to that information by making these specifications available in November.

What would this cost Microsoft? On the face of it, simply the staff effort to gather the relevant specifications and make them available. Of course, the documents may not exist as well-written specifications, in which case I would urge Microsoft to make as much information available as possible, allowing others to make sense of them against the ”ground truth” of existing files. It would be wonderful if Microsoft could make available a migration tool, but this would obviously be a larger effort wth longer term implications. Indeed, in the long run it might be more cost effective to support an open migration tool.

The benefit to Microsoft in doing this would be in enhancing its reputation as a responsible company that understands and acts on the implications of its past work.

Possible outcomes could include input filters for open software such as OpenOffice or Libre Office, input converters for SlideShare and others, and possible Microsoft or  commercial 3rd party migration tools.

The societal benefits of this would include better preservation of a subset of cultural artefacts, a better understanding of the content of presentations in early days, which may document discoveries or encapsulate persuasion arguments for significant change programs. Ultimately, this is about a richer cultural heritage. My own presentations in PowerPoint 4.0 date from the time when I was Director of the JISC Electronic Libraries Programme, and document how we sought to persuade the community to go forward with that campaign, and some of the adjustments that were made to it.

I have found a previous Open Letter on a similar subject, from Rick Jelliffe on the O’Reilly XML blog, at http://www.oreillynet.com/xml/blog/2008/03/an_open_letter_to_microsoft_ib.html.

I would really appreciate your views on whether this might be possible.

Yours, Chris

Advertisements

103 Responses to “Open letter to Microsoft on specs for obsolete file formats”

  1. Andrew Treloar 23 October, 2012 at 22:09 #

    As someone who carries their computing life back to the early 80s on their laptop, I certainly support this!

  2. Tuomas J. Alaterä 23 October, 2012 at 22:18 #

    I strongly support this idea, and truly hope to see a positive response from Microsoft. Access to specifications of older formats is crucial to digital preservation and surely that is in Microsoft’s interests as well. However, I want to second Euan’s comment on emulation as a path of digital preservation. To achieve that access to software, not only specifications, will be necessary. In some cases a working emulation platform may be the only way for a file rescue mission to overcome the holes in specifications.

  3. dgm 23 October, 2012 at 22:42 #

    Having spent a good part of my professional career dealing with data curation and data conversion issues, I am strongly of the opinion that having legacy formats documented in an open and accessible manner can only be a good thing and I am happy to support Chris’s open letter to Microsoft

  4. Jeff Meyer 23 October, 2012 at 22:46 #

    Chris – I’m interested in figuring out a way to help, but am confused by many of the comments on this thread (maybe people aren’t reading the other comments?). Is the goal of having the specification for the sake of having the specification itself, or is it to reveal the content of potentially unreadable files? If the the former, that may be searching for the nonexistent. If the latter, then you do not need an explicit specification document in order to do that. You just need working software that implements the specification.

    • tjowens 23 October, 2012 at 22:58 #

      Happy to support the essence of this too. As a defacto format for documents a range of MS formats are going to dominate the historical record. Anything Microsoft can share in terms of documentation and information about them would be a boon for preservation.

    • Evelyn McLellan 25 October, 2012 at 18:14 #

      The existence of open specifications allows for the development of open-source tools to read older files and to convert them to standardized, preservation-friendly formats such as Open Document Format or PDF/A (or others that might appear down the road). Without the specs, such conversions are based on reverse-engineering the format and are often unreliable. As others point out, open specs also support other preservation strategies such as emulation.

      I would like to add my support to this initiative. I fully support Chris’ request and look forward to seeing a response from Microsoft to it.

  5. Michael Carden 23 October, 2012 at 22:57 #

    A laudable goal but I doubt that Microsoft has these specs availabale internally any more, let alone in a form that could be released. Best of luck with it.

  6. David Groenewegen (ANDS) 23 October, 2012 at 23:53 #

    An excellent initiative.

  7. Gail Steinhart 24 October, 2012 at 00:25 #

    This is a great idea and I can’t see any down side to this proposal. This is a gnice opportunity for Microsoft to do the right thing.

  8. Mark 24 October, 2012 at 00:35 #

    +1 This would be a responsible thing for Microsoft to do.

  9. MacKenzie Smith 24 October, 2012 at 01:38 #

    Please, Tony, do your best to make this happen. It would be great PR, nothing lost and much gained. What’s more, researchers will expect it!

  10. Libor Coufal NLA 24 October, 2012 at 01:42 #

    Like many other memory institutions, National Library of Australia has a load of files in legacy MS formats. Just a quick search in our small testing sample of files returned several PowerPoint 4 files which can’t be open with the current PowerPoint version. Any initiative which would help to solve this problem is very welcome.

    Jeff: Yes, the ultimate goal is (not only) to reveal the content, but more importantly to save it in a newer, working version. If you can have access to software which can do it, then you’re saved. But what if you don’t? How much trouble (and expense) you’ve got to go into to get you there? And is this a long-term viable solution? I guess, having the specifications available would give everyone a greater confidence that such a solution can be developed, not only now but also anytime in the future. Having said that, I perfectly understand that it may not be viable neither, but if it is, it would definitely be very appreciated (as you can see from the comments).

  11. Hye-Ran Suh 24 October, 2012 at 02:54 #

    Thank you Chris for your great effort. I support you with all my heart.

  12. David Kay, MLS 24 October, 2012 at 03:16 #

    I strongly support this initiative. It’s time to address long-term sustainability issues for born-digital materials, and to articulate problems and challenges related to vendor lock-in, software obsolescence, broken promises of backwards compatibility, and proprietary/open standards. Let’s start with Microsoft whose Office software have dominated the modern office for the last two decades. Good luck, Chris!

  13. Kai Naumann 24 October, 2012 at 07:35 #

    I strongly agree, too. This is easy, cheap, and no threat to Microsoft’s business model.

  14. Panyarak Ngamsritragul 24 October, 2012 at 08:33 #

    Great initiative!

  15. Panyarak Ngamsritragul 24 October, 2012 at 08:35 #

    Great move!

  16. Paddy McCann (@paddymcc) 24 October, 2012 at 10:45 #

    I’d like to add my support for this request.

  17. Matt J 24 October, 2012 at 12:07 #

    I also support this initiative – good luck!

  18. Andrea Goethals 24 October, 2012 at 13:41 #

    We need access to old specifications like this to preserve our cultural heritage. I fully support this!

  19. Danianne Mizzy 24 October, 2012 at 15:35 #

    +1. Much needed!

  20. John Doyle (@seanoduill) 24 October, 2012 at 15:35 #

    +1

  21. Margot Caldbick 24 October, 2012 at 23:51 #

    I think it would be a good public image building exercise in which Microsoft might show some corporate responsibility. That’s never a bad thing.

  22. Alan Langley 25 October, 2012 at 01:24 #

    Great Idea – at least it would give the open source community the opportunity to work around software obsolescence and backwards compatibility is not always available with Microsofts software.

    • Susan Thomas 25 October, 2012 at 09:17 #

      If Microsoft were to do this, it would be a great help to those of us working to preserve our digital heritage. Strongly support this initiative and hope Microsoft see the benefit in responding positively.

  23. John Salter 25 October, 2012 at 10:19 #

    Sounds like an excellent plan. Are there other companies that *should* be making old format specs available openly (I’m guessing that MS aren’t the only culprits!)?

  24. Seth Shaw (@seth_e_shaw) 25 October, 2012 at 13:39 #

    I fully endorse this letter and hope that Microsoft fulfills the request. It would be a great show of leadership that hopefully others would follow.

  25. Susan Miller 25 October, 2012 at 14:17 #

    I absolutely agree. This is an important issue, especially to those of us working to preserve older electronic records. Thank you!

  26. kshawkinn Hawkins 25 October, 2012 at 14:27 #

    I fully support this.

    • Maura Keleher 25 October, 2012 at 15:12 #

      This will not only help archivists, but historians in the future. Good idea!

  27. Nat Wilson 25 October, 2012 at 15:24 #

    I too fully support this proposal and hope that Microsoft will make it easier for archivists to preserve information stored in their older formats.

  28. Amanda Norman 25 October, 2012 at 16:41 #

    I support this initiative. Thanks for taking the time to write this letter.

  29. Matthew McKinley 25 October, 2012 at 17:18 #

    I fully support this as well, and agree that the cooperation of a “major player” in digital formats will bring positive attention to both Microsoft and the digital preservation community.

  30. Matt 25 October, 2012 at 18:14 #

    I FULLY support this. There are probably BILLIONS of documents worldwide on older MS file formats. It would be greatly irresponsible of Microsoft to end their support for them. We art talking about the possible loss of major portions of our historical and cultural record.

    • Peter Van Garderen 25 October, 2012 at 18:39 #

      I wholeheartedly agree with Chris’ statement that “Microsoft’s software tools are not merely temporary mechanisms for profit in the marketplace, but (by dint of their flexibility and success) tools that the wider world has used to create billions of cultural artefacts that may be of lasting value. By declining to help make these obsolete file formats accessible, Microsoft is locking up this cultural content, and will eventually throw away the key.”

      I urge Microsoft to release whatever specification documents may still exist under a public license. In the likelihood that much of this documentation is no longer accessible (only underlying the need for improved systems to ensure digital longevity) I’d also urge Microsoft to establish a policy for issuing public licenses for its library of legacy applications and operating systems (wherever MS decides to draw this line, e.g. 10 years after release?) to allow for legal implementation of emulation strategies.

  31. Kara Van Malssen (@kvanmalssen) 25 October, 2012 at 19:21 #

    Very much in support of this. I hope Microsoft sets an example that other companies (ahem, Apple) might follow. Not only is this important for cultural heritage and memory institutions, but perhaps even more so for corporate assets in legacy formats that have business or legal reasons for preservation.

  32. Richard Ovenden 25 October, 2012 at 20:18 #

    I am very much in support of this. Microsoft would make a great contribution to digital preservation, cultural heritage, and a whole variety of scholarly disciplines. Its in thier interest too, surely?

  33. Wendy Kozlowski 26 October, 2012 at 16:15 #

    I wholeheartedly support this initiative and endorse your request. It would be so useful to users and in addition to advancing preservation and digital scholarship, it would be a great PR move on Microsoft’s part.

  34. Stephen Abrams 26 October, 2012 at 22:39 #

    Thanks, Chris, for initiating this outreach to Microsoft. As you have identified, open access to the older Office specifications would greatly facilitate efforts at maintaining the long-term viability and usability of many important digital assets.

  35. Paul N. Courant 27 October, 2012 at 20:31 #

    Chris’s request is reasonable and timely.

  36. Peter Suber 27 October, 2012 at 21:17 #

    A very good idea, for Microsoft and every other software company.

  37. Armin Straube 29 October, 2012 at 13:38 #

    I endorse this letter to Microsoft as a representative of nestor (German competence network for digital preservation).

    It is important to stress, that Microsoft could benefit with little effort from a policy that makes outdated file format specifications available. Microsoft would gain trust amongst the users of its file formats, if migration and preservation becomes easier.

  38. John Kunze 30 October, 2012 at 18:55 #

    I fully support this. It would be very important to have older file format specifications made available via the Microsoft Open Specifications page.

  39. Matthew Woollard 12 November, 2012 at 09:41 #

    Chris, I fully support this. It’s increasingly important to ensure that older file format specifications are made available. Microsoft owe it to their customers, and the users oif the future.

Trackbacks/Pingbacks

  1. Comments from some Responses to “Open letter to Microsoft on specs for obsolete file formats” « Unsustainable Ideas - 31 October, 2012

    […] have been rather overwhelmed by the wonderful response to my Open Letter. There are many excellent comments among the 100 or so. Most comments are simple words of support […]

  2. Response to the Open Letter on obsolete Microsoft file formats « Unsustainable Ideas - 26 November, 2012

    […] may remember the Open Letter I sent to Tony Hey of Microsoft and published a few weeks back (https://unsustainableideas.wordpress.com/2012/10/22/open-letter-ms-obsolete-formats/). Well I’m please to say that Tony has responded. I’ve included the full text of his response […]

  3. Bring out your dead! « York Digital Library - 30 January, 2013

    […] years ago on an old Mac, to a current format (he found a company that could do it) and also his open letter to Microsoft about publishing the specifications for old versions of their file formats. The latter had […]

Leave a Reply to Kara Van Malssen (@kvanmalssen) Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: