Part of the point of my attempt to access my old PowerPoint 4 files (see here and here for the latest state) was to see what I could learn from it, with half an eye on the Jason Scott November month of action on file formats, see also planning here). So, what did I learn that might be of more general interest than the specific case?
I guess the first thing is: the Internet is your friend! I knew that, and so do you, but I continue to be amazed at the extent that people will go out of their way to help you if you ask (not always, but often). Of course, this is partly due to the particular set of people who know (of) me…
Faced with a set of files that you know little about, and that don’t “automatically” open (based on defaults in your file system and operating system), the first thing you will need is some mechanism to identify the file format. This may apply even if the file system does appear to know about the file format. The files in this case were a mixture:
- a) some had a .ppt file type and opened correctly
- b) some had a .ppt file type but PowerPoint 2004 refused to open them
- c) some had no file type extension (files migrated from Macintosh System 7 OS without the resource fork).
We can ignore group (a). Group (b) files were of two types, it turned out: some were PowerPoint 4.0 files, but some were Word 6.0 files that had wrongly had a .ppt extension added (by me, usually due to some clue in the file name like “slides”, forgetting that prior to using PPT 4 I made slides in Word and printed them to transparencies). Group (c) files were a mixture of PPT 4 and Word 6 files (with an odd Macintosh Write file thrown in for good measure).
In order to work out what to do, you really do have to know what you have got. So, the first thing you need is a pointer to a set of tools that can identify file formats. Those tools may require a set of signatures to help them. But, for an amateur like me, you probably don’t want to run a professional-level tool; you probably need a simple procedure that will help. In my case, this was opening each file with Word 2004 using the “recover any text” option, and looking for some characteristic content after the main bulk of the slide content. In my case there was some text characteristic of PowerPoint like:
“dRClick to edit Master text styles
In addition there was some information that I recognised as relating to the original Mac directory structure for the files, and a few instances of the text “PowerPoint 4.0”. Note, in this case it is clearly important to know the particular version of the file format that you have; PowerPoint is NOT sufficient!
[As an aside, David Rosenthal has this to say on file format identification tools:
“Several people responded to my criticism of format identification tools. Matt said:
‘I do agree that identification of textual formats is increasingly important, and further efforts are probably needed in this area.’
“I don’t agree and have said so in the past. As regards Web formats, to the extent to which format identification tools agree with the code in the browsers they don’t tell us anything useful, and to the extent to which they disagree with the code in the browsers they are simply wrong. Applying these tools as part of a Web preservation pipeline is at best a waste of resources and at worst actively harmful.”
Now it’s always worth reading David’s text carefully, and clearly he’s referring to objects that form part of displayed web pages. However, these are not the only kinds of files on the web; many make other files available via the web, and for some of these in my opinion David is wrong.]
The beauty of using Word 2004 for this job was that the files that were really Word files mis-identified as “.ppt” opened flawlessly in Word and were clearly different!
Once you know what you’ve got, you probably want information on the risks (to you) for content in that format. What problems are you likely to have “rendering” the files (ie causing them to display their content as they should)? What problems are likely if you try to migrate the files (ie open and “save as” some more modern format)? Is older software available to you that could open the files? Are older computer systems available to you? You could sum these up as, asking what is the degree of obsolescence of the files? Finally, you need some hints as to the action window that you have available. I could have converted these files some years ago via a colleague who had software that would open them, but her machine has been updated and the newer version has lost this option (thanks for nothing, Microsoft).
It’s worth noting that a lot of the “official” advice on obsolescence that you might find is useless. Various sites will classify formats as obsolete that are still perfectly easy to open and migrate from. Indeed, I suspect that there’s no really helpful way to classify obsolescence (I tried and failed). And it changes… before this exercise I would have classified PPT 4 as pretty high on any obsolescence scale, but now we have a simple and free migration option (or low cost, if you expect to have a lot of these).
So now you know what formats your files are in, and you have some idea of the risks to those files. Perhaps it’s time to take action! Now you’ll need to know:
- d) What software is available to render, and preferably to migrate (save as) the files? And which of these options is free or cheap enough for you?
- e) What services are available to render, and if possible to migrate the files to a newer format? And which of these options is free or cheap enough for you?
- f) What older technology routes do you or your contacts have access to, that might help you to render and/or migrate the files? This might involve getting access to software licences that are no longer commonly available.
- g) What older environments could you or your contacts emulate? Again, this might need access to software licences that are no longer commonly available.
For cases (f) and (g), PowerPoint 4.0 licences would not be appropriate, as they would probably not give me a useful “save as” option (although they might help me to triage the content by being able to view the slides as I intended them, and there might be an indirect route such as “Print to PDF”). I’d need a licence that was newer than PowerPoint 4.0 but older than PowerPoint 2004 (which no longer supports the old file format).
In this case, the answer to (d- software that could render or migrate) was: none. I could not find software available to me that could even render the files reasonably well.
The answer to (e- services that could render or migrate) was initially: none. But Zamzar came good when I asked them and sent them some examples. In a later case, I was having problems migrating some newer PPT files that had embedded objects (graphs from Excel), and Zamzar managed to convert these as well, and suggested they might add this to their standard option, too.
The answer to (f- older technology) for me was: none. But through my contact network I did find someone who had access to an appropriate Mac with the intermediate software. This was OK for proof of concept, but would not have been suitable for converting all my 50-odd files. It’s possible that I could have bought a licence for that software and run it on my Mac, but I didn’t want to spend that much on an option that still might not work. I suppose I could have grabbed a version off a torrent somewhere, but I do try to stay legal!
The answer to (g- emulation options) for me was: none that were feasible. There were emulation options suggested, but they still needed the intermediate software (see above).
To summarise: faced with older files that you cannot open, I think you need the following information, in roughly this order:
- information to help identify the formats at an appropriate level of precision,
- information on risks to your content, once the format has been identified, and
- information on routes that will allow you to render (not least for triage purposes) and possibly migrate the files to a more modern format.
I realise there are use cases where it is essential that the file be presented in its original format, but these use cases are of little interest to me. I want to read Sir Walter Scott’s works in a modern edition, not the original, but I’m not a Scott scholar! Likewise, I’m interested in my older content for its use to me, not to study how it looked in the old days.