The PowerPoint 4.0 adventure: what did I learn?

15 Oct

Part of the point of my attempt to access my old PowerPoint 4 files (see here and here for the latest state) was to see what I could learn from it, with half an eye on the Jason Scott November month of action on file formats, see also planning here). So, what did I learn that might be of more general interest than the specific case?

I guess the first thing is: the Internet is your friend! I knew that, and so do you, but I continue to be amazed at the extent that people will go out of their way to help you if you ask (not always, but often). Of course, this is partly due to the particular set of people who know (of) me…

Faced with a set of files that you know little about, and that don’t “automatically” open (based on defaults in your file system and operating system), the first thing you will need is some mechanism to identify the file format. This may apply even if the file system does appear to know about the file format. The files in this case were a mixture:

  • a) some had a .ppt file type and opened correctly
  • b) some had a .ppt file type but PowerPoint 2004 refused to open them
  • c) some had no file type extension (files migrated from Macintosh System 7 OS without the resource fork).

We can ignore group (a). Group (b) files were of two types, it turned out: some were PowerPoint 4.0 files, but some were Word 6.0 files that had wrongly had a .ppt extension added (by me, usually due to some clue in the file name like “slides”, forgetting that prior to using PPT 4 I made slides in Word and printed them to transparencies). Group (c) files were a mixture of PPT 4 and Word 6 files (with an odd Macintosh Write file thrown in for good measure).

In order to work out what to do, you really do have to know what you have got. So, the first thing you need is a pointer to a set of tools that can identify file formats. Those tools may require a set of signatures to help them. But, for an amateur like me, you probably don’t want to run a professional-level tool; you probably need a simple procedure that will help. In my case, this was opening each file with Word 2004 using the “recover any text” option, and looking for some characteristic content after the main bulk of the slide content. In my case there was some text characteristic of PowerPoint like:

“dRClick to edit Master text styles

Second Level

Third Level

Fourth Level

Fifth Level”

In addition there was some information that I recognised as relating to the original Mac directory structure for the files, and a few instances of the text “PowerPoint 4.0”. Note, in this case it is clearly important to know the particular version of the file format that you have; PowerPoint is NOT sufficient!

[As an aside, David Rosenthal has this to say on file format identification tools:

"Several people responded to my criticism of format identification tools. Matt said:
'I do agree that identification of textual formats is increasingly important, and further efforts are probably needed in this area.'
"I don't agree and have said so in the past. As regards Web formats, to the extent to which format identification tools agree with the code in the browsers they don't tell us anything useful, and to the extent to which they disagree with the code in the browsers they are simply wrong. Applying these tools as part of a Web preservation pipeline is at best a waste of resources and at worst actively harmful."

Now it's always worth reading David's text carefully, and clearly he's referring to objects that form part of displayed web pages. However, these are not the only kinds of files on the web; many make other files available via the web, and for some of these in my opinion David is wrong.]

The beauty of using Word 2004 for this job was that the files that were really Word files mis-identified as “.ppt” opened flawlessly in Word and were clearly different!

Once you know what you’ve got, you probably want information on the risks (to you) for content in that format. What problems are you likely to have “rendering” the files (ie causing them to display their content as they should)? What problems are likely if you try to migrate the files (ie open and “save as” some more modern format)? Is older software available to you that could open the files? Are older computer systems available to you? You could sum these up as, asking what is the degree of obsolescence of the files? Finally, you need some hints as to the action window that you have available. I could have converted these files some years ago via a colleague who had software that would open them, but her machine has been updated and the newer version has lost this option (thanks for nothing, Microsoft).

It’s worth noting that a lot of the “official” advice on obsolescence that you might find is useless. Various sites will classify formats as obsolete that are still perfectly easy to open and migrate from. Indeed, I suspect that there’s no really helpful way to classify obsolescence (I tried and failed). And it changes… before this exercise I would have classified PPT 4 as pretty high on any obsolescence scale, but now we have a simple and free migration option (or low cost, if you expect to have a lot of these).

So now you know what formats your files are in, and you have some idea of the risks to those files. Perhaps it’s time to take action! Now you’ll need to know:

  • d) What software is available to render, and preferably to migrate (save as) the files? And which of these options is free or cheap enough for you?
  • e) What services are available to render, and if possible to migrate the files to a newer format? And which of these options is free or cheap enough for you?
  • f) What older technology routes do you or your contacts have access to, that might help you to render and/or migrate the files? This might involve getting access to software licences that are no longer commonly available.
  • g) What older environments could you or your contacts emulate? Again, this might need access to software licences that are no longer commonly available.

For cases (f) and (g), PowerPoint 4.0 licences would not be appropriate, as they would probably not give me a useful “save as” option (although they might help me to triage the content by being able to view the slides as I intended them, and there might be an indirect route such as “Print to PDF”). I’d need a licence that was newer than PowerPoint 4.0 but older than PowerPoint 2004 (which no longer supports the old file format).

In this case, the answer to (d- software that could render or migrate) was: none. I could not find software available to me that could even render the files reasonably well.

The answer to (e- services that could render or migrate) was initially: none. But Zamzar came good when I asked them and sent them some examples. In a later case, I was having problems migrating some newer PPT files that had embedded objects (graphs from Excel), and Zamzar managed to convert these as well, and suggested they might add this to their standard option, too.

The answer to (f- older technology) for me was: none. But through my contact network I did find someone who had access to an appropriate Mac with the intermediate software. This was OK for proof of concept, but would not have been suitable for converting all my 50-odd files. It’s possible that I could have bought a licence for that software and run it on my Mac, but I didn’t want to spend that much on an option that still might not work. I suppose I could have grabbed a version off a torrent somewhere, but I do try to stay legal!

The answer to (g- emulation options) for me was: none that were feasible. There were emulation options suggested, but they still needed the intermediate software (see above).

To summarise: faced with older files that you cannot open, I think you need the following information, in roughly this order:

  1. information to help identify the formats at an appropriate level of precision,
  2. information on risks to your content, once the format has been identified, and
  3. information on routes that will allow you to render (not least for triage purposes) and possibly migrate the files to a more modern format.

I realise there are use cases where it is essential that the file be presented in its original format, but these use cases are of little interest to me. I want to read Sir Walter Scott’s works in a modern edition, not the original, but I’m not a Scott scholar! Likewise, I’m interested in my older content for its use to me, not to study how it looked in the old days.

About these ads

6 Responses to “The PowerPoint 4.0 adventure: what did I learn?”

  1. Euan Cochrane (@euanc) 15 October, 2012 at 17:06 #

    Very interesting write-up thanks Chris.

    I have one “small” question:
    You say
    “I’m interested in my older content for its use to me, not to study how it looked in the old days.”.
    The question I have is how did you confirm that the content you ended up with was the content that was there originally, or that it was the content you wanted to preserve? Did you try to check this in any way? I assume you might possibly remember what the content was like. This would not be possible in many other contexts where the preservation practitioner would probably not have any solid idea of what the content was meant to include.
    That, to me, is the reason why in almost all cases digital objects need to optionally be able to be presented without alteration (i.e. by rendering the original files with the original environment). Doing so enables the user to confirm which content was there originally.
    This should, i believe, have been one of your first steps. i.e. you should have identified what content was to be preserved so that you could confirm post-preservation actions, that it had been preserved. Without doing this, how do you know whether you were successful or not?

    “You don’t know what you don’t know”

    in this context becomes:

    “You don’t know which content you didn’t preserve if you don’t know which content was there originally”.

  2. Chris Rusbridge 15 October, 2012 at 17:19 #

    Fair point, Euan. Remember, my context is as “amateur” rather than professional, I’m trying to represent someone who’s dealing with his own stuff. In this case, I had no easy way of undertaking your “firt step”, as I couldn’t open the files at all. One of the main aims of this was to triage the files to see which of them were worth putting up on SlideShare.

    I had however already “seen” some of the text in the slides through my use of Word’s “Recover any text” capability. When the slides were finally made available, there were plenty of indications of reasonable authenticity: my name as author, often the event at which I was speaking and a title. The slide template was one I remembered (though I had completely forgotten it until I saw the converted file). Most of the sentiments expressed were ones I would have expected from me at that time.

    One non-authentic feature was the conversion date appearing in some slide footers. However, that was because I used a poor feature of PowerPoint 4.0 that I’ve since deprecated.

    Let’s also remember, the first conversion was done using MS software. That’s no guarantee of authenticiity, but it’s a start. I was able to compare the version migrated by Zamzar with the version from the MS software.

    • Euan Cochrane (@euanc) 15 October, 2012 at 17:43 #

      Thanks for the (quick) reply Chris, again very interesting. I really appreciate the detail you’ve included in this post series.

      A few more comments/replies:

      1. As you say “Let’s also remember, the first conversion was done using MS software. That’s no guarantee of authenticiity, but it’s a start.”

      I added the bold as I’d like to confirm your suspicion. MS office seems to be generally better at rendering objects that have as their primary file a file created from a previous an ms-office suite. But it still has issues and can’t be depended upon to be reliable (some comments on this are included in here i believe).

      2. As I mention here, your issue with dates can be fixed when using a properly configured emulator:
      “The date of last modification would be used when configuring the rendering environment to ensure active date fields were contemporaneous with the file (i.e. the emulated environment would have the system date set to the date the file was last modified).”

      I have tested this approach and partly automated it myself using qemu.

      3. The tests you mention that you used to confirm the integrity of the content post preservation are unfortunately probably not automatable and at least one is slightly circular/an infinite regress (using a preservation option “text recovery” to confirm another preservation option).
      Even though you class yourself as an amateur in this context, this example case is likely to be one of the rare and therefore interesting ones in which the creator is there and able to confirm the success of the preservation action. So while you may be an amateur your experience is still very relevant and does not bode well for professional archivists who need to apply these techniques on a large scale and therefore need to be able to automate them if they want to be able to afford them.

  3. Chris Rusbridge 15 October, 2012 at 19:33 #

    “So while you may be an amateur your experience is still very relevant and does not bode well for professional archivists who need to apply these techniques on a large scale and therefore need to be able to automate them if they want to be able to afford them.”

    The automation of any of this is left as an exercise for the reader!

Trackbacks/Pingbacks

  1. The PowerPoint 4.0 adventure: what did I learn? | Digital Continuity | Scoop.it - 16 October, 2012

    [...] Part of the point of my attempt to access my old PowerPoint 4 files (see here and here for the latest state) was to see what I could learn from it, with half an eye on the Jason Scott November mont…  [...]

  2. New bookmarks, 9/25 – 10/01: live-tweeting, obsolete formats, learning to code | My Blog - 4 October, 2013

    […] The PowerPoint 4.0 adventure: what did I learn? | Unsustainable Ideas […]

Comments always welcome, will be treated as CC-BY

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: