Over four years ago, four Google employees published a white paper, A Proposal for Shared Dictionary Compression over HTTP, describing SDCH, an extension to HTTP/1.1. The extension supports a new compression mechanism, announceable through the
Accept-Encoding request header, allowing a potentially significant reduction in HTML response sizes.
The basic setup from the paper, which is both short and very readable:
To cut down on all the repetition, just tell the browser which chunks of HTML appear frequently — these are the dictionaries — and instead of duplicating that same chunk in responses, refer to the dictionary held by the browser. Partial caching, in a sense.
For example, suppose every HTTP response contains this footer from a site-wide template:
<footer> <ul> <li>...</li> <li>...</li> <li>...</li> </ul> <!-- lots more HTML here --> </footer>
Call this whole chunk of HTML
X and have the browser remember it. From then on, replace any instance of this chunk in a HTTP response with a reference to
X. Saved bandwidth! Repeat for
This alone might do a lot, but treating the entire dictionary — the footer in the example — unusable just because of a few changed words is wasteful, so there’s a sensible next step: don’t restrict it to all-or-nothing chunks, instead provide deltas (or differences) relative to this stored dictionary, the browser can then compute the necessary result; if a single word changes, the difference between the dictionary and final result will still see a decent saving.
As if that wasn’t neat enough, SDCH uses VCDIFF, a simple delta encoding format described in RFC3284 that is completely independent of the algorithm used to compress the data and calculate the delta, for future proofing.
(As a total aside to all VCDIFF libraries: if you claim a VCDIFF implementation but you just compute deltas and decompress, as vcdiff.js does, you haven’t implemented VCDIFF. The encoding format is the part that counts, since VCDIFF is explicitly independent of the underlying compression algorithm.)
What happened? The
mod-sdch Apache module project is inactive. The SDCH Google group is all but silent, with a mere 2 topics in the whole of 2012, and no other mailing list discussion seems to exist. A Firefox implementation has no interest. Where’d it go?
The current version of Chrome includes
sdch in its
Accept-Encoding request headers, so the browser-side decompression support both exists and is available for over a third of users.
SDCH seems like a great idea with little downside, yet there’s a confusing lack of adoption. Are the gains negligible? Is transport-layer caching uninteresting compared to something like Turbolinks? Has SPDY — the basis of HTTP/2.0 — promised a brighter future with full-duplex communication and no FIFO semantics?