Recently I have identified some 50 or so files that were created by PowerPoint 4.0 on my old Mac (running System 7.5.x I think) in the late 1990s. It’s proving rather hard to open these files, since Microsoft has removed support for all formats before Office 97/98. There used to be an option pack you could get for Windows Office 2000-ish that would let you open these old files and save them as more current versions (a process known in the digital preservation community as “migration”). However that option does not seem to be available in current versions of MS Office.
My edited re-post of an older post about legacy document formats was inspired by this current problem, and also the upcoming November month of action on file formats. Microsoft has actually been rather good at making most of its document formats publicly available; however this does not include any formats prior to Office 98.
Before I go banging on Microsoft’s door (whether this is likely to be productive or not), it’s worth asking whether the PowerPoint 4.0 specifications would help me if I had them? In digital preservation circles, the information needed to interpret an arbitrary file is known as Representation Information (RI), and file format specifications are often quoted as one class of useful RI (here’s an older post about it, and a further comment in the PS at the end).
So, if Microsoft digs in its closet and posts the PowerPoint 4.0 specs tomorrow, what can I do with them? Well, I guess I could download the specs, open up one of my files in a binary editor, and start going through the file… no, that wouldn’t work. If I want to dig out the contents the crude way, I can open the file in MS Word and ask it to recover any text. Then I go through that by hand and delete 95% of it as garbage format stuff, take what’s left and re-format as I like. Text only, of course. The trouble is, I wouldn’t have a great deal of confidence in the result… I’m not sure for example how much deleted text remains in the file, and of course there are hidden slides that I would think were displayed.
So maybe I could open the spec, grab my trusty programming manual, and start to write a decoder… no, that wouldn’t work, I rather strongly doubt that my Pascal is good enough for the job (;-)!
Actually the specs are very little direct use to me at all. But there ARE people who would find it interesting and useful. Some of those might be in a position to code a new input filter for Open Office or Libre Office. That’s a one-time, sustainable piece of work that would provide a migration route for anyone anywhere in the world who has this and related problems. I’ve written about this before, here.
You may chuckle and think that only retired idiots have a bunch of PowerPoint 4.0 files that they didn’t migrate in time. But my files are a small exemplar of the avalanche of files from the 1990s that are going to arrive in archives in the next few decades. Quite a lot of people used early versions of MS Office and its components, so if we can tackle a sustainable migration tool, that would be a Good Thing! I think, if the specs WERE available, the effort to do something useful could be found somewhere.
PS There’s a new version of the Open Archival Information System digital preservation standard out (and the old one has vanished from the CCSDS website, natch!). I had a look to see if there was a significant change in the definition of Representation Information. I’m not sure; there is a change marked, though I don’t know how significant it is. Representation Information is defined as:
“The information that maps a Data Object into more meaningful concepts. An example of Representation Information for a bit sequence which is a FITS file might consist of the FITS standard which defines the format plus a dictionary which defines the meaning in the file of keywords which are not part of the standard.
Another example is JPEG software which is used to render a JPEG file; rendering the JPEG file as bits is not very meaningful to humans but the software, which embodies an understanding of the JPEG standard, maps the bits into pixels which can then be rendered as an image for human viewing.”
It’s nice, and important, that the example includes software. In my case the only useful tools for me will be software tools built around the specs. So, my answer to my own question is: those specs would not help me directly, and are no use to me in interpreting my files. But with a little help from my friends, my problem could be solved!