RFCs provide technical information about protocols and other such Internet things. They are fairly important. They are the blueprints and architectural plans. Trying to build tools around them is more painful than it should be.
RFCs primarily come in plain ASCII text, free of all formatting, which makes them very portable — although versions with some HTML formatting exist, too. Behind the scenes, these documents are produced and stored in more machine-readable ways, such as XML or nroff/groff. With these machine-readable versions, you can develop better tools or run interesting analyses. (The example which lead to this blog post: a digraph of all RFC obsoletions and updates.)
Mislav Marohnić built Pretty RFC to make RFCs easier to read and navigate. Just compare the official HTML version to Pretty RFC’s. It does a great job.
Unfortunately, Pretty RFC doesn’t cover everything. Try to look up the WebSockets RFC and there is simply a link to the regular document. The reason is a lack of access to those machine-readable documents. The look-up process for Pretty RFC to get an XML representation of the document is not as simple as it should be, because of it:
The fetcher tries to find the XML in http://xml.resource.org/public/rfc/xml/ where some RFCs in the 2000–53xx range can be found.
Failing that, it fetches the metadata for the RFC from http://datatracker.ietf.org/doc/
If there is a link to the XML from the datatracker, use that. There probably won’t be a link, though.
When there is no XML link, the fetcher looks up the draft name for the RFC and checks if it can at least find the XML for its draft at http://www.ietf.org/id/
And even after those steps, the RFCs available as XML are few in number.
This process only discovers XML sources for a small subset of RFCs. This is the biggest problem I have right now. The XML and nroff files in which RFCs were authored are usually not published, but are archived by rfc-editor.org and available by request by email.
This doesn’t seem right.
I don’t know who’s ultimately responsible for managing this kind of thing, but the job of RFC Editor is contracted out to Association Management Solutions, LLC. So I would assume they’re tasked with managing access to the data.
While there are likely other things to do — and there is probably a backlog, based on the out-of-date “never issued” list, and numbering gaps that are unresolved since 2009 — I think opening up the Internet’s blueprints is a worthwhile goal.