DOI Enabling Joi Ito’s weblog, Phase I

As Joi wrote, we “DOI Enabled” his blog, meaning we made it possible for some—not all—of his posts to be registered with DOIs via Crossref.

In the above-linked post, Joi goes through some of his needs and desires, as well as some of the issues around doing this. There’s more to say about all that, but here I want to lightly document just what we’ve actually done.

Firstly, it needs to be understood that Joi’s website is a bit of living organism. As an underlying platform for content management, we still use Movable Type. Over the years of experimentations, I’ve built a framework which allows me to leverage MT’s static file building infrastructure, while dynamically layering over it pretty much anything we want to do in terms of aggregation, processing, caching and re-publishing.

It also bears saying that—and this is something I personally value greatly about our working style—Joi is usually quite comfortable with rough edges, sometimes having to do things manually and sometimes relying on a bit smoke and mirrors. Despite this, I try to automate as much as makes sense for our bespoke needs, and wrap it up in a minimalist, functional UX. This is why there are no CMS plugins or code up on GitHub: it’s all totally custom integrations using whatever existing tools I can get my hands on and hack together. This is also why I won’t go into details of how anything was done: it’s all hacked together. It works, here, for our needs, and to demonstrate our ideas.

So, in this first case, the task was to:

  1. Define the DOI ID pattern we would use.
  2. Generate that ID consistently, only for selected posts.
  3. Display the DOI on posts selected.
  4. Generate the XML for submission to Crossref.

1. Define the DOI Pattern

This was actually quite an important and interesting question: What should we base the unique identifier for blog posts on? I was very keen to make sure it was somehow meaningful and humanly intelligible, as well as easily generated without any risk of duplicates of corruption. Crossref also provides this handy advice for how to construct your DOI identifiers, which helped a lot.

For old-hand webloggers like us, it was mostly obvious that something based on the post’s date and time, and maybe title, would be the way to go. Now, trying to sanitize a title down into a unique, readable and meaningful short string is basically not going to happen. DOIs need to be short and easy to make note of—remember, students doing research and compiling citations. That means random character patterns—like hashes (e.g.: “DED95EF56A8A8354F00C384B45D86B29”)—or meaningless CMS post IDs (e.g.: “5668”) were less than ideal.

The clinching argument for me was when I asked myself: “what is one of the medium-defining characteristics of blogging?” and the answer came back loud and clear—and perhaps somewhat controversially: chronology. A post exists when it is first published, and within a blogger’s own namespace, and unless they are inhuman, only one post can exist in a minute of the timeline.

For our purposes, and I would propose that for any human-scale blogging’s purposes, the date-time of publication would serve as our DOI identifier: YYYYMMDD.HHmm (e.g.: 20180822.2140 )

Joi agreed, and we proceeded.

2, 3 & 4. Standardize the generation of our DOI Identifier and publish it

All I needed here was a tiny custom template which, when included, generates the ID string from the post’s publication date. Easy as pie. This template is included anywhere where a DOI’ed post is published: the homepage index, any archive indexes, the post itself and, importantly, the XML submission file.

The XML submission file is an example of the smoke and mirrors mentioned above. A knowledgable reader will have noticed already that I did not say “and then we submit the post metadata to CrossRef on publication for registration”. We haven’t done that, and may never, because for our “proof of concept” needs, it is not worth the investment. Until we get to Phase II, anytime we want to register a new DOI, we publish the post, email a lovely person at MIT Press who then manipulates the levers at Crossref to make the submission. This takes days in some cases. It isn’t perfect, but streamlining that process is not a top priority now. All this means is that for freshly published posts, there may be a few days lag before the linked DOI actually resolves.

One more thing

As Joi mentions, CrossRef doesn’t explicitly support DOI’ing weblogs, only because there is no schema for the submission of blog post metadata. The suggestion came back to use “dataset” which made both Joi and I chuckle as well as wrinkle up our noses. Ok, for now. Maybe we’ll use their “journal” schema which fits a bit better, form wise… but ideally we’ll help establish a blog metadata schema.

But that’s for phase III.
We still need to do phase II: take the DOI submission process into our own hands.