Response to the Open Letter on obsolete Microsoft file formats

26 Nov

You may remember the Open Letter I sent to Tony Hey of Microsoft and published a few weeks back ( Well I’m please to say that Tony has responded. I’ve included the full text of his response below.


I have a reply from Jim Thatcher in the Office team:

1)      We do not currently have specifications for these older file formats.

2)      It is likely that those employees who had significant knowledge of these formats are no longer with Microsoft.

3)      We can look into creating new licensing options including virtual machine images of older operating systems and old Office software images licensed for the sole purpose of rendering and/or converting legacy files.

4)      One approach we could consider is for Microsoft to participate in a “crowd source” project working with archivists to create a public spec of these old file formats.

I think it would be sensible for you to talk directly with Jim – and Natasa in the UK – to see if there are some creative options that Microsoft could pursue with the archivist community.


Now it’s worth pointing out that this is a response from a coupe of individuals to my Open Letter, not a formal commitment by Microsoft. But it’s a good start and we need to work to make more good come from it.

The first two points are more or less as expected, and as suggested by several commenters (some picked out in my blog post of selected comments

The later points are very welcome, and I would be very happy to see them taken forward.

Is there any appropriate group to work with Microsoft on appropriate licence terms for older software to render or migrate legacy files?

Is there an appropriate group to coordinate crowd-sourcing interaction with Microsoft? I can see at least 4 approaches:

a) Create a set of sample files from all obsolete versions available under CC0, eg  the CURATEcamp 24 hour worldwide file id hackathon (#fileidhack) public format corpus .

b) Work on a complete set of format identifying signatures, so that an unknown file can be properly identified.

c) Work as suggested on the specs of some of these older formats. Based on a quick look at an old MS Word file, some of these older formats are not that complicated.

d) Work to include these formats in Open Source Office suites, so we can migrate files into the future at no cost to Microsoft.

All of these would need a little leadership so that Microsoft didn’t get bogged down with interaction costs. Some could take place without funding and with little more than leadership (as in File Formats November or #fileidhack, for example). Some might need more resources and a bit of funding.

I think Microsoft has returned service. What next, folks?


13 Responses to “Response to the Open Letter on obsolete Microsoft file formats”

  1. Henk Koning 26 November, 2012 at 10:54 #

    Congratulations, Chris!

    I hope you can rally support from the archivist community to take this up with Microsoft.

  2. Patricia Galloway 26 November, 2012 at 16:22 #

    Congratulations, Chris. I can’t speak for the American archival educator community as a whole, but I know this advance will be well-received by it and especially by our students. We are learning every day that many more digital objects are surviving on supposedly short-lived media than was thought before people began trying to read said media, so the need for these older formats and environments is real and will continue to be so all the more as we recover more older digital objects.

  3. Libbie Stephenson 26 November, 2012 at 19:42 #

    Hi Chris,
    One thing that would be really useful would be to have some guidelines on how to begin when one wants to figure out an older format. Are there some techniques or hacks one can try? Are there ways to get at the code that might be less obvious to the novice? Can we build a community of people who have some experience and who can share their approaches?

  4. Leslie Johnston (@lljohnston) 26 November, 2012 at 20:21 #

    You know, Chris, I think the folks at Microsoft could stand to do a little more research of their own. I know they have an excellent corporate archive (the late Lee Dirks was their archivist at one time before moving on to other roles at Microsoft Research). I find it difficult to believe that they have no records.

    • Chris Rusbridge 27 November, 2012 at 15:02 #

      To be fair, I don’t think they say they have no records, only that they have no specifications. Add qualifiers like “complete”, “accurate”, “matching the actual files” and “that we wouldn’t be ashamed of”, and you can see why. Someone commented on the Open Letter that the only specs were likely the code.

      However, if MS is willing to participate in crowd-sourced work, they should be willing to provide such documentation as they have (possibly short of the actual source code), that could be worked up ibto some kind of post facto spec. Especially if we have some ground truth files to compare against!

      But for me, the ultimate goal is open-sourced tested and working code in tools that will process, render and migrate documents. That’s worth much more than specs!

  5. dgm 26 November, 2012 at 21:19 #

    An alternative strategy is to track down the code (or the people who developed them) of conversion products of the time, eg word for word, or who wrote quattro pro’s impost and export features …

    • Chris Rusbridge 27 November, 2012 at 15:04 #

      Doug, yes persuading such people to participate in any crowd-sourced project would be really valuable. We did get an offer of help from an “ex-Microsoftie” on the original post…

  6. Tom DePlonty 27 November, 2012 at 21:38 #

    I don’t know if this helps at all, but a product called Quick View Plus (now sold by a company called AvantStar) reads many of the legacy MS formats. It was originally (i.e., in the 90s) sold as an add-on utility for email clients, to allow people to read file attachments without having the applications they were created with.

    These days it seems to be marketed mostly for computer forensics. I knew the people who developed the underlying technology, and who put huge amounts of effort into reverse-engineering MS file formats. If AvantStar was willing, they might be able to provide a lot of assistance – but of course, they are selling products and solutions based on the technology, and may not be interested in providing such assistance for free.

  7. yuhong 30 May, 2013 at 05:40 #

    For the PowerPoint 4.0 and 95 formats, how about shipping PP4X32 and PP7X32 from Office 2003 (the last version that had them) separately and writing a wrapper EXE that calls the conversion DLLs?


  1. Response to the Open Letter on obsolete Microsoft file formats | Digital Continuity | - 27 November, 2012

    […] You may remember the Open Letter I sent to Tony Hey of Microsoft and published a few weeks back ( Well I’m pleas…  […]

  2. More back from Microsoft « Unsustainable Ideas - 28 November, 2012

    […] my posting of the initial response to my Open Letter, Jim Thatcher wrote back to […]

  3. Observations and Survey Results – ARMA IM Days – Ottawa | Candy Strategies - 2 December, 2012

    […] hitting the concrete wall with some file formats from the last couple of decades – even older Microsoft formats are not immune from the growing threat of format rot. Organizations that are not thinking about […]

  4. Lettre aux amis de la police (et de la gendarmerie) 2013/01 | Le blog Criminocorpus - 9 January, 2013

    […]   et parce que Microsoft est un ami qui nous veut du bien notamment quand on possède des fichiers dans des versions devenues obsolètes de word par exemple : […]

Comments always welcome, will be treated as CC-BY

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: