DOI Enabling Joi Ito’s weblog, Phase II

After figuring out our blog post DOI identifier pattern, deploying it on Joi’s site and generating the registration XML in Phase I, I needed some time to get familiar with CrossRef’s DOI metadata submission process and mechanisms (Step 3 in CrossRef’s “Content Registration Guide”).

The academic publishing space does software development, UX and DX in a very no-frills way. No API, just a Web UI and an HTTP POST end-point with some basic documentation. Trial and error with real live data, bare-knuckle debugging, UI label tea-leaf reading… it was loads of fun, but also took a lot of time. As someone who’s been hacking stuff together with web APIs for years, it was fun to engage with something so raw. 🍖🤠 

As mentioned in the Phase I writeup, over the years I’ve built up a whole infrastructure for Joi’s site to do aggregation and caching and republishing and such. But the needs—both code and UI—for submitting an XML file via HTTP POST were not something we had yet, so I needed to build that up. We didn’t go the “automatic submission on publication” route for now as it represented a deeper technical footprint.

So, I made a Big Red Button for Joi to press anytime he wanted to update his weblog DOI registrations.

When pressed, the DOI submission XML—which is created by MT on publication—is POSTed to the CrossRef endpoint, and any Error or Success response is displayed.

Possible future directions

  • Separate utility to submit only new posts
  • Maintain local database of metadata and registrations status
  • Capture Crossref API response email and work into this process

Joi Ito’s (Academic) Readings

Took a short side path from the work I am doing with Joi and the MIT’s Knowledge Futures Group around academic blogging, and put together a listing and widget of Joi’s academic reading.

I know the feeling of wanting to share one’s reading habits from using GoodReads and my own personal ebook library, So when Joi mentioned he’d like to share the scholarly works he reads, I understood completely. The things we read become part of our minds and having some way of sharing that is another way we share of ourselves and help others to understand where we’re coming from in our own words and works.

Joi already uses Zotero to track his scholarly reading, which we discovered can provide an RSS Atom feed (OMG!) and an “extra” field the user can fill. As limited as this is—why not provide unlimited user-customizable key=value pairs, Zotero?—it is good enough for now.

Most of the per-item metadata that Zotero’s RSS feed includes, however, is in an HTML <table> in the Atom <content> tag. Go figure.

Parsing the feed and the HTML DOM was easy enough of course. What took the better part of the day was the data itself. I needed to figure out what data we wanted to feature, which we could rely on being present (Title, Author, etc…), what other data was present that we’d like to display if it happens to be available, and what to do with the rest. Showing everything every time would have been easy but also very messy. I opted to limit what we show on our listing and widget, and for everything else, we link through to the Zotero record.

I mentioned the “extra” field. This is a good example of how sometimes Joi needs to do a bit of manual labor because something can’t be automated. We use the “extra” field for a “date read” date stamp. Joi needs t manually enter the date he wants to say he read—started or finished, doesn’t really matter—the work in question. This allows us to list the works in that true blogging reverse-chronological way.

This development uses some of the core infrastructure I’ve built over the years for Joi’s site, with all aggregating, parsing, munging and caching using 3rd party libraries and functions I’ve assembled and have ready for exactly these sorts of projects.

DOI Enabling Joi Ito’s weblog, Phase I

As Joi wrote, we “DOI Enabled” his blog, meaning we made it possible for some—not all—of his posts to be registered with DOIs via Crossref.

In the above-linked post, Joi goes through some of his needs and desires, as well as some of the issues around doing this. There’s more to say about all that, but here I want to lightly document just what we’ve actually done.

Firstly, it needs to be understood that Joi’s website is a bit of living organism. As an underlying platform for content management, we still use Movable Type. Over the years of experimentations, I’ve built a framework which allows me to leverage MT’s static file building infrastructure, while dynamically layering over it pretty much anything we want to do in terms of aggregation, processing, caching and re-publishing.

It also bears saying that—and this is something I personally value greatly about our working style—Joi is usually quite comfortable with rough edges, sometimes having to do things manually and sometimes relying on a bit smoke and mirrors. Despite this, I try to automate as much as makes sense for our bespoke needs, and wrap it up in a minimalist, functional UX. This is why there are no CMS plugins or code up on GitHub: it’s all totally custom integrations using whatever existing tools I can get my hands on and hack together. This is also why I won’t go into details of how anything was done: it’s all hacked together. It works, here, for our needs, and to demonstrate our ideas.

So, in this first case, the task was to:

  1. Define the DOI ID pattern we would use.
  2. Generate that ID consistently, only for selected posts.
  3. Display the DOI on posts selected.
  4. Generate the XML for submission to Crossref.

1. Define the DOI Pattern

This was actually quite an important and interesting question: What should we base the unique identifier for blog posts on? I was very keen to make sure it was somehow meaningful and humanly intelligible, as well as easily generated without any risk of duplicates of corruption. Crossref also provides this handy advice for how to construct your DOI identifiers, which helped a lot.

For old-hand webloggers like us, it was mostly obvious that something based on the post’s date and time, and maybe title, would be the way to go. Now, trying to sanitize a title down into a unique, readable and meaningful short string is basically not going to happen. DOIs need to be short and easy to make note of—remember, students doing research and compiling citations. That means random character patterns—like hashes (e.g.: “DED95EF56A8A8354F00C384B45D86B29”)—or meaningless CMS post IDs (e.g.: “5668”) were less than ideal.

The clinching argument for me was when I asked myself: “what is one of the medium-defining characteristics of blogging?” and the answer came back loud and clear—and perhaps somewhat controversially: chronology. A post exists when it is first published, and within a blogger’s own namespace, and unless they are inhuman, only one post can exist in a minute of the timeline.

For our purposes, and I would propose that for any human-scale blogging’s purposes, the date-time of publication would serve as our DOI identifier: YYYYMMDD.HHmm (e.g.: 20180822.2140 )

Joi agreed, and we proceeded.

2, 3 & 4. Standardize the generation of our DOI Identifier and publish it

All I needed here was a tiny custom template which, when included, generates the ID string from the post’s publication date. Easy as pie. This template is included anywhere where a DOI’ed post is published: the homepage index, any archive indexes, the post itself and, importantly, the XML submission file.

The XML submission file is an example of the smoke and mirrors mentioned above. A knowledgable reader will have noticed already that I did not say “and then we submit the post metadata to CrossRef on publication for registration”. We haven’t done that, and may never, because for our “proof of concept” needs, it is not worth the investment. Until we get to Phase II, anytime we want to register a new DOI, we publish the post, email a lovely person at MIT Press who then manipulates the levers at Crossref to make the submission. This takes days in some cases. It isn’t perfect, but streamlining that process is not a top priority now. All this means is that for freshly published posts, there may be a few days lag before the linked DOI actually resolves.

One more thing

As Joi mentions, CrossRef doesn’t explicitly support DOI’ing weblogs, only because there is no schema for the submission of blog post metadata. The suggestion came back to use “dataset” which made both Joi and I chuckle as well as wrinkle up our noses. Ok, for now. Maybe we’ll use their “journal” schema which fits a bit better, form wise… but ideally we’ll help establish a blog metadata schema.

But that’s for phase III.
We still need to do phase II: take the DOI submission process into our own hands.

Pre-formatted Citations for Joi Ito’s weblog posts

The path to bringing Joi’s blog posts closer to the academic record began with a straightforward template addition allowing easy citation of the posts.

I researched and compiled a number of the most used academic citation formats for citing blogs or online journal entries, designed a data structure to provide the necessary information and put the two together into a template component which is now deployed on each post on Joi’s weblog.

When someone wishes to cite any of Joi’s blog posts, they can simply click/tap on Cite in the footer of the post in question; a popup modal is opened with the list of citation formats already rendered. Clicking/tapping on any one of them automatically selects the whole citation text, ready for copy and pasting.