Against semantic markup

@siracusa tweeted this a little while ago

Hypercritical #33 correction: <em> and <strong> … not <b> and <i>. Apologies to @gruber and semantic markup sticklers.

(Hypercritical is a podcast he does with Dan Benjamin at 5×5.com; go listen, but there’s nothing in the podcast relevant to what follows here.)

The idea here is that the <b> & <i> tags (bold & italic) are typographical, or display, instructions, and as such should be left up to the page designer. We should supply semantic markup instead to give the designer enough information about what we want displayed that the italic or bold typeface can be chosen as appropriate. For our purposes, those tags are <em> and <strong>, short for “stress emphasis” and “strong importance”. <strong> can be nested to indicate stronger and stronger importance.

This kind of semantic markup is fine in its place, but HTML isn’t the place to enforce it. A sufficient reason is that HTML doesn’t have a rich enough set of tags to do the work. The APA Style Manual lists seven reasons to use italics:

  • Titles of books, periodicals, and microfilm publications
  • Genera, species and varieties
  • Introduction of a new, technical, or key term
  • Emphasis
  • A letter, word, or phrase referred to as such
  • Letters use as statistical symbols or algebraic variables
  • Anchors of a scale

Sure, “emphasis” is on the list…along with six others that HTML has no tag for. And that’s not an exhaustive list.

One of the WordPress themes I use oddly inverts the representation of em/strong from i/b to b/i. It must have seemed like a good idea to someone at some time, but the only way I could use it on my site was to “fix” the CSS, which fortunately I was in a position to do. The thing is, there’s nothing technically wrong with doing that: “emphasis” is nowhere defined as “italics”.

So (except for cases where you’ve already taken care of things via CSS and classes), if you want italics, go ahead and use <i>. Ditto <b> for bold. And don’t apologize for it.

And now for a slight digression. HTML5 adds a bunch of new “semantic tags”, like <header> and <section>. Notice that “semantics” ends up referring to at least two rather distinct categories. The new HTML5 tags describe document structure, a kind of containerization where the container names aren’t all “div”. But the kind of semantic reference we’re talking about in the above list-of-reasons-to-italicize have nothing to do with document structure; they have to do with the connection between the pieces of the document and the great outside world: movie names, species, name-vs-use.

I mention this as an introduction to an oldish essay by John Allsopp, Semantics in HTML5. It’s the kind of thing that’s just as well to keep in the back of your mind when you start creating The Semantic Web.

Oh, the title. I’m not against semantic markup. Really. Just against using em/strong as fancified ways of saying italic/bold and then calling it “semantic markup”.

Time Machine is not version control

If your response to the title is, “Well, duh!”, you may stop reading here.

If you’re wondering, “What’s Time Machine?”, it’s OS X’s built-in automatic backup capability. You can pretend the title is “Periodic backup is not version control”.

If you’re wondering, “What’s version control?”, it’s a mechanism, formal or informal, that preserves copies of earlier versions of a document, with an eye to be able to undo changes if necessary, or at least go back and see the history of changes to a document. Programmers will think of version control systems like Git or Mercurial or Subversion. If you keep backup copies of your important Word documents at various stages in their life, that’s informal version control. OS X 10.7 (Lion) has a form of built-in version control for some applications.

Time Machine effectively backs up your entire system once an hour. If you mess up a document, it’s possible to go back to a previous version and restore it to its previous state. This capability makes Time Machine temptingly resemble version control. But treating it as such is hazardous (which is not to deny that it can be very handy, even a life saver, when it works). Why?

A secondary reason first. Time Machine does a backup every hour, but it doesn’t save all of those backups. It saves the hourly backups for the past 24 hours, daily backups for the past month, and weekly backups for everything older than a month. So it’s entirely likely that the versions of your document that Time Machine has available are not the ones you’re interested in.

The primary reason is this. A document’s previous versions are themselves documents, and potentially important ones. Important documents need to be backed up, which is to say that you need at least one redundant copy. But if you’re relying on the Time Machine copy (or any backup, for that matter), you have only one copy of the historical version of the document: the one on the backup disk. If that disk fails, you have no backup at all.

So keep using Time Machine as a safety net. But if the thought of all those old versions disappearing completely makes you nervous, start thinking about some other means of version control, one in which the old versions are backed up.