Friday, September 21, 2012
Hackday project: tarpio.us

I saw this tweet from @kellan that got me to thinking:

It would be useful if Tent.io could describe itself in the world of protocols I know, particularly PuSH, XMPP, and SMTP.

I think the notion of a distributed social network is an interesting one, and I've been pondering how it might work. I think I've got a scheme for one, which I'll outline in this post, and I think it might not take that long to throw it together with off-the-shelf open source and a dedicated hack day.

General Principles

Use existing, open standards. This increases the likelihood of finding good open source code to use when building software, ensures that a distributed marketplace of (interoperable!) clients and servers could arise, and increases the likelihood of avoiding platform specificity. Plus, giants have pretty high shoulders. I have limited hack time; I'm not into wheel reinvention.

Central service provider not required. This is pretty fundamental; means there's no central organization that owns/has access to everything ("creepy alert!"), but also that there's no central organization that can be subjected to lawsuits (e.g. Napster[1]).

Users have control over sharing. At least for the initial communication, users need to be able to say exactly who gets to see a piece of content. Of course, once you've shared something with someone else, there's no stopping that other person from sharing it further, but that's what social trust is all about.

Protocol Sketch

TL;DR. S/MIME, public key crypto, Atom.

To expand: everyone runs their own AtomPub server somewhere, or contracts with a service provider to do it on their behalf. Multiple PaaS and IaaS providers have a free tier of service, which is almost certainly more than enough for most users. I get whole tens of visits to my blog every day. And I am not a prolific sharer: looks like I've averaged 1.1 tweets per day since starting to use Twitter. "Roflscale" is probably enough, to be honest. Easy recoverability probably more important than high availability for a single-user server/hosted PaaS app, so I think this could be done for less than $10 a year (most of which is probably just buying a domain name). Given the relatively low traffic, actual friends could pretty easily share infrastructure.

Now, I'm the only one who posts to that Atom feed, and anyone can poll it at any time. The trick is that the content is encrypted, and you specify who gets access to the decryption key. Basically, you generate a random (symmetric) encryption key for the content, then public-key encrypt copies of the content key for all your recipients. Put it all together in one multipart package, upload to, say, S3, and then post a link to it on your Atom feed. Of interest is that neither S3 (the storage) nor the AtomPub server care one whit about the encryption/decryption stuff.

Interestingly, following gets somewhat inverted. You can post content intended for someone who never subscribes to your feed (gee, I hope he notices me!). You can also subscribe to someone's feed who never encrypts stuff for you. But you can see anyone's public broadcasts, of course. Actual "friending" is just GPG key exchange and an addition of the key to appropriate keyrings to represent friend groups/circles.

On the client side, then, it's mostly about subscribing to people's Atom feeds, polling them periodically (for which HTTP caching will really help a lot), cracking open the content encryption keys with your own private key, then interleaving and displaying the content.

That's about it. Definitely possible to layer PubSubHubbub on top to get more realtime exchange; no reason your free tier server couldn't host one of those for you too. Or perhaps just a feed aggregation proxy that would interleave all your subscriptions for you (without needing to understand who any of the recipients were, or even if you were one of the recipients!).

Hack Day Proposal

Roughly, get a simple AtomPub server stood up somewhere. Apache Abdera has a tutorial for doing exactly that, if you can implement the storage interface (MySql/Postgresql/SimpleDB/S3 I think would all work; looks to be just a Map). Front with Apache and some form of authentication module. Alternatively, grab Abdera and retrofit into Java Google AppEngine and use Google federated OpenID for login.

Client side, cobble together a UN*X command-line client out of bash/Python/Ruby/whatever, depending on what open source is out there. It'd be kind of neat to perhaps do commands interleaved on the command line, in MH style. Everybody's got an HTTP client library that can be made to convince the server you're you. AtomPub clients are available in Python (1,2), Java (3,4), Ruby (5,6). GnuPG for the public key implementation. Bouncycastle has a Java S/MIME implementation. Lots of S3 clients out there (7,8).

I think it's entirely possible to have a simple client and server up in a day with a smallish group of folks (5-6). If there are more folks we can also try multiple clients and servers.

Tentative Projectname

Since @kellan started this off by mentioning tent.io, and this is an attempt to slap something together with off-the-shelf stuff, it felt more like a lean-to made out of a tarp than a true tent, so I've snarfed the domain name tarpio.us (which is the closest I could come to something that meant "like a tarp" that lined up with a cheap TLD). Cool part, of course, is that there's no reason tent.io and tarpio.us couldn't interoperate with protocol bridges someday.

If you're interested, hit me up on Twitter at @jon_moore or follow along on GitHub. I'll see when I can free up for a hackday soon!

Footnotes

  1. I'm not advocating violating copyrights here, merely noting that Napster was a P2P system with a single point of failure (an organization that could be sued).

Saturday, September 1, 2012
Resources and Query Parameters

[Editor's note: This is a cross-post from a Tumblr entry; I started to write it as a quick note because someone was wrong on the Internet, but by the time it was done, it was long enough to be a blog post in its own right.]

What kind of string is this?

http://example.com/path?query=foo

Well, it's a URI (Uniform Resource Identifier) as well as a URL (Uniform Resource Locator). But wait, that means the whole string (check RFC 2396) is resource identifier, which means the whole thing identifies a resource (literally). Not just the "http://example.com/path" part.

I often run into folks that think this string identifies a "http://example.com/path" resource that happens to take a parameter, which is understandable, because this is how almost every web framework is set up to implement it (identify your "/path" route, set up your controller, pick off the parameters as arguments).

However, from an HTTP point of view, and especially from a hypermedia point of view, this isn’t right. HTTP (the protocol) treats everything from the path onward as an opaque string--it shows up as a single token on the Request Line. The whole thing (query included) is used as the key for an HTTP cache. In fact, the only difference between a URL like "http://example.com/path?query=foo" and one like "http://example.com/path/query/foo" is that the former is not cacheable by default if it comes from an HTTP/1.0 origin server. That's it. And even that can be overridden with explicit Cache-Control or Expires headers.

From a hypermedia client point of view, you don't care which style of URL is used. Sure, you might have to construct your HTTP request slightly differently if there are query parameters involved, but that's mechanical--no semantics involved, just syntactically parsing the URL to figure out how to GET it. The only reason to prefer one over the other is purely stylistic; most modern web frameworks can pluck arguments out of a path as easily as they can out of query parameters.

Remember, a hypermedia client never constructs URLs on its own; besides a few well-known entry-points (which it should treat opaquely), it is only using URLs directly fed to it by the server, or constructed according to recipes provided by the server (typically through forms or link templates). This here is probably the main driver for which style you want to use; do you want to use HTML forms, which, for GET, use query parameters, or do you want to use link templates, which tend to use path parameters, stylistically (although they can support query parameters too)?

So in a hypermedia world, there’s really no such thing as a "RESTful" URL structure; a truly RESTful client--one which understands and uses hypermedia affordances--doesn't care.

Thursday, August 30, 2012
Hypermedia Programming: Lists

The humble list is one of our simplest and yet most powerful data structures--so much so that we will even routinely write them out by hand. We put them on sticky notes to help us remember things we need to do or things we need to buy from the grocery store. We even use them for entertainment. In this post I'll explain how to represent and manipulate lists using hypermedia techniques.

The most straightforward list representation actually doesn't look that different than a handwritten to-do list; the text/uri-list media type just consists of one URI per line. This makes the format incredibly concise, since there is very little syntactic structure (just interleaved line breaks!), while making it completely general through the use of globally unambiguous identifiers.

Now let's talk about manipulating this list with HTTP. I would expect to be able to:

  • delete the list by issuing a DELETE to its URI
  • reorder the list by issuing a PUT to its URI with the complete, reordered list of URIs
  • insert or remove new URIs somewhere in the middle by issuing a PUT, again with a complete "new" version of the list
  • append to the list by issuing a POST with a text/uri-list body containing the new URI(s) to be added
The above assume, of course, that I am properly authenticated and authorized to modify the list. If the original list resource had an ETag or Last-Modified header, I would supply If-Match or If-Unmodified-Since headers on my modification request.

Once the list grows large, however, using PUTs to make what seem like "small" changes (removing an item, inserting an item) doesn't seem particularly efficient. For these types of changes, I'd like to be able to use PATCH and specify these small edits. Now, since text/uri-list is a text type, we ought to be able to borrow the output of the common 'diff' utility to specify the changes we want to make. [Unfortunately, it turns out the output of diff isn't registered as a standard media type, although I'm trying to rectify that as well in my not-so-copious spare time.]

This means, for example, we could see something like the following protocol exchange, starting with retrieving the initial list:

Adding pagination

These general approaches will work very well up through relatively large lists, although at some point your list will get bigger than you are willing to serve in a single representation. Now it's time to add pagination!

The easiest way to do this on the server side is to provide Link headers (RFC5988) which tell you where you are in the list. In fact, there are registered link relations that are perfect for this already, in particular:

    first
    Points to the first page of the list.
    last
    Points to the last page of the list.
    next
    Points to the next page in the list.
    prev or previous
    Points to the previous page in the list.

Now let's work through an example. Suppose you fetch a URL that you expect, from context, to be a list and these response headers come back:

HTTP/1.1 200 OK
Date: Fri, 31 Aug 2012 01:38:18 GMT
Content-Type: text/uri-list
Link: <./page/2>; rel="next last"
...
Now, you can infer a couple of things, namely, that this list spans multiple pages (due to the presence of the "next" link), but also that it has exactly two pages (because the "next" link is also the "last" link). We can also tell that this is the first page, because there isn't a "prev" link; we might also be able to infer that if the server additionally provided:
Link: <.>; rel="first"

Ok, that works well for paginated list retrieval. It's not too hard to look for these Link headers and traverse them to retrieve and/or iterate over the entire list. But now how about updates? There's actually an ambiguity problem here, because we followed a particular URL for the whole list but got back a representation for the first page of the list instead. If I DELETE that URL, does it:

  • delete the entire list; OR
  • delete all the entries on the first page only?
The short answer is: there's no way to tell. As a server implementor, though, when someone did a GET on a list that I decided I needed to paginate, I might instead issue a 302 redirect to a different URL representing the first page explicitly. For example:
GET /list HTTP/1.1
Host: www.example.com

HTTP/1.1 302 FOUND
Date: Fri, 31 Aug 2012 01:38:18 GMT
Location: http://www.example.com/list/page/1
Then I could treat PUTs, DELETEs, PATCHes and POSTs to /list as if they were targeting the entire list, and treat requests to /list/page/1 as if they were targeting just the first page.

But back to our client conundrum; perhaps our server doesn't adhere to this redirect convention--it's certainly not an official standard. How do we proceed? Well, if our goal (and writing "goal-oriented" clients is a good orientation for hypermedia clients) is to delete the whole list, then we can just alternate DELETEs with GETs until the whole thing is gone. Either the DELETE affects the whole list in the first shot, or it deletes just the first page. In the latter case, I've made progress and can repeatedly delete the first page until I'm done.

[Sidebar: avid readers of the HTTP spec will have spotted the trick question already here. DELETEs are supposed to be idempotent, but deleting just the first page of a list is not an idempotent operation because the second page becomes the first page, so repeated DELETEs to the same "first page" URL will continue deleting more items. Therefore the correct behavior for the server is to delete the entire list. However, if you meet a server that has decided on the second semantics, good luck waving standards documents around to get its implementor to change it.]
If our goal, however, is just to remove the first page, we probably want to PATCH it. However, that's not a commonly implemented HTTP verb (expect 501 NOT IMPLEMENTED or 405 METHOD NOT ALLOWED), and there isn't a standardized text diff media type yet anyway, so that might not work. In this case, our client may well have to be prepared to DELETE the entire list and then reconstruct just the desired parts with PUT and/or POST.

What's very interesting about this is that the client as we've described it actually implements a closed-loop control mechanism. It takes sensor readings or receives feedback via "examining" the system with GETs, and then takes further actions based on its current goal and the current state of the system. For a really good introduction to how this can lead to very robust systems, see "GN&C Fault Protection Fundamentals" by Robert Rasmussen; although the paper is about spacecraft guidance systems (cool!) its concepts are easily applicable to software systems in general.

Richer Representations

The text/uri-list, while a great format for capturing list membership and order, doesn't tell a recipient anything about the actual members of the list. In that sense, it's all identifier and no data. For those list members that are URLs, we can attempt to GET them or check their HEAD or ask for their OPTIONS to get more information. For URIs that are not locators (and hence dereferenceable), like URNs or Tag URIs, we'd have to consult a lookup service or have access to other out-of-band information. At any rate, if a client was looking for a particular member of a list, it might have to make several more requests before it could find the right one. In particular, a human looking at the list in a browser will likely have to do a bunch of cut-n-pasting to fully investigate the list contents.

What can we do about this? In "REST APIs must be hypertext-driven", Roy Fielding suggests the following pattern:

Query results are represented by a list of links with summary information, not by arrays of object representations.
In other words, along with the links, we want to provide a little extra contextual information to make it easier for the client to identify what they're looking for. The text/uri-list format lacks this extra context and assumes the recipient can find it elsewhere easily. Perhaps we should look for alternative formats that are nearly as concise but which also provide opportunity to supply the little bit of context Fielding describes. Two immediate options that spring to mind are Atom+XML and Collection+JSON, which are media types whose domains specifically include lists.

For example, here's our initial list of favorite items, represented in application/atom+xml

Now all the same rules apply for this list, as far as what the methods mean (i.e. POST appends a new item to the list, etc.). This is an example of what folks mean by uniform interface. If a URL represents a list, then POSTing to it should append to the list, regardless of the media type used to serve the list or to enclose a new item to be appended. So long as the client and server commonly understand a sufficient set of media types, they can interoperate. In the case of the Atom-formatted list, I would probably expect to have to POST an <entry> containing my new item, as I have a strong hint that the server understands application/atom+xml. However, the server may also advertise additional formats with Link headers (Atom lets us do this with embedded <link> elements too):

Link: <.>; rel="alternate"; type="text/uri-list"
To take advantage of these I may need to adjust my client's Accept header to specify my preference for them. But at any rate, if the resource is a list, there's no reason a client couldn't GET it as Atom, and then POST a single URI onto it with text/uri-list, so long as the client and server understand both media types. If the server doesn't, it may well reply with a 415 UNSUPPORTED MEDIA TYPE and then the client may try again if it has another option.

Last but not least, since I like using HTML as the media type for my APIs, we should point out that this is also a fine and relatively compact way to represent a list:

Where to go next?

I've given you a brief tour about how to deal with hypermedia lists in a standardized way, relying on the documented semantics of HTTP and various media types. I believe it should be possible to construct relatively robust and general client libraries for dealing with lists in all their incarnations; that would be a great open source project...hint hint.

Tuesday, July 10, 2012
Using HTML as the Media Type for your API

There is an ongoing (and interesting) discussion on the API-craft mailing list revolving around designing new media types for enabling hypermedia APIs primarily for programmatic consumption. As some folks may know, I like to use HTML as the media type for my hypermedia APIs. Steven Willmott opined:

I think the problem isn't "why not HTML" it's "why HTML" - if you strip out all the parts of HTML which are to do with rendering things for presentation you're left with almost nothing at all:
  • <a>
  • <h1>, <h2>, <h3> ... (as nesting)
  • <ul><li>
  • <ol><li>
  • <p> maybe (as a kind of separator) - or <div> ...
and even some of these are marginal. There is useful stuff around encodings, meta-data etc. but pretty much everything else is redundant.

I thought this raised such an interesting implicit question, and I get asked about this enough that I thought it warranted a longer response. There are actually a variety of reasons I prefer using HTML:

  • rich semantics
  • hypermedia support
  • already standardized
  • tooling support

Rich Semantics

I've heard many folks say that HTML is primarily for presentation and not for conveying information, and hence it isn't suitable for API use. Hogwash, I say! There are many web experts (like Kimberly Blessing) who would insist that markup is exactly for conveying semantics and that presentation should be a CSS concern. People seem to forget that web sites actually worked before CSS or Javascript was invented! I rely on this heavily for my HTML APIs.

Now don't get me wrong--I'm not advocating a return to 1995; Javascript, CSS, and HTML advances have clearly afforded richer user experiences. But that doesn't mean your HTML API needs to serve up or depend on CSS or Javascript any more than clients need to execute it, necessarily. Just because the media type can express things you don't need or want doesn't make it a bad media type for your use--this is confusing the content for the media type.

So let's get to specifics. From a semantic and informational point of view, there are whole segments of the HTML spec that I've found useful for expressing data structures. We obviously have lists (<ol>), bags (<ul>), and maps (<dl>). Raw XML doesn't have any of these, and JSON can't distinguish lists from bags and is constrained to use strings as map keys. We get encapsulation or grouping via ancestor inclusion or explicitly with <div>. We get 2-dimensional data layouts via <table>, and in fact something even more general than a 2-dimensional array via @colspan and @rowspan.

But more powerfully, with the <a> tag, we have the ability to represent arbitrary data structures, even circular ones (which tree-structured media types like XML or JSON cannot represent). In fact, we can even represent distributed data structures (which, arguably, is what the Web as we know it is--a giant distributed data structure). This is amazingly powerful, and for comparable expressiveness in a different media type, you'll have to define conventions for all these things.

Now, let me just take a run through the HTML5 spec and identify which elements are useful or not useful from an API point of view:

<html>
required, so moot
<head>
useful for overall representation metadata, especially via <link> and <meta>
<title>
if you have a string that could be construed as a name for the whole representation, why not put it here?
<base>
useful for unambiguously supporting relative links
<link>
one of the key hypermedia controls, see the "Hypermedia Support" section below
<meta>
useful for arbitrary data annotations
<style>
Okay, I'll grant that this one is not as important for machine-to-machine (m2m) consumption, but it comes into play under the "Tooling Support" section below.
<script> and <noscript>
Arguably, this is useful for implementing code-on-demand, but I'll grant that my current m2m use cases aren't this advanced yet.
<body>
necessary for separating metadata in <head> from actual data
<section>, <article>, <aside>, <h1>-<h6>, <hgroup>, <header>, <footer>, <blockquote>
These are primarily useful for describing content meant for human consumption, and while I have not had cause to use these myself, they would clearly have an important place to play if data payloads had this structure, e.g., in the API for a content management system (CMS). That said, I'm happy to lump these into "not useful" for the sake of argument.
<nav>
For m2m, I'm not sure there's much benefit to this over <link>s in the <head>, although there's room for more expressiveness here. Let's say, YAGNI here.
<address>
If you have data that's an address, why not mark it as such? Seems like a not-that-unusual circumstance.
<p>, <pre>, <span>
These are fine containers for arbitrary string data with slightly different semantics, particularly around whether whitespace is significant or not, and whether content may reasonably flowed when presented in a UI. However, these offer the ability to have rich content if desired as well.
<ol>, <ul>, <dl>, <li>, <dt>, <dd>, <div>
As mentioned above, necessary for representing data structures.
<figure>, <figurecaption>
Arguably not needed for m2m interactions.
Text-level semantics like <i>, <b>, etc.
Not useful immediately for m2m interactions, but rather to allow rich payloads. Arguably, a JSON-based media type could carry HTML markup in its strings, but then there is an impact on tooling and visibility, which we'll discuss in tooling support below.
<img>
I've seen many APIs that send around links to thumbnails, for example. Clearly useful.
<iframe>, <embed>, <object>, <canvas>, etc.
Similar to the discussion of <script> above, our m2m interactions are not advanced enough to take advantage of these (yet).
<audio>, <video>
Similar to images, allows for discussion of multimedia as first-class objects.
<form> et al.
Perhaps the single biggest reason to use HTML is its support for parameterized navigation via forms. See "Hypermedia Support" below.

Looking back across this list, sure, there's a lot of things that might not be immediately useful, but there's actually quite a large portion of HTML that offers semantics I'd immediately find useful in a programmatic API. We basically get to reap the benefit of many years of evolution in HTML, where its expressive power has grown and been refined over the years. You'll end up repeating most of the HTML standardization process to get a new media type up to the same level of expressiveness.

On top of that, however, are facilities for describing application-domain specific semantics, namely through the use of microdata and/or RDFa--all the "semantic web" stuff. I don't have to create a new semantic ontology for my application domain; I can leverage and/or enrich my markup with Dublin Core or Schema.org.

In short, from a data description point of view, HTML and its associated standards give me all the tools I need to describe almost anything I could imagine, and those facilities are all off-the-shelf from my perspective.

Hypermedia Support

HTML offers <a>, <link>, and <form> as obvious examples of hypermedia controls. In fact, the use of <form> to support parameterized navigation (where the client supplies some of the information needed to formulate a request) fairly well sets HTML apart from most existing standard (in the sense of being registered in the IANA standards tree for media types) media types. While currently this construct is not as powerful or expressive as it could be--c.f. only supporting GET and POST for methods--it's actually enough to get by, and is certainly sufficient for a RESTful system (if you care about qualifying for the label). Furthermore, there are ongoing efforts within the HTML5 standards process to address this.

(As an aside, it's worth noting that <audio>, <video>, <iframe>, and <img> are also hypermedia controls).

Already Standardized

HTML is shepherded by an existing open standards process and a large community of experts, which means it has all the social machinery for ongoing support and evolution. More than that, however, HTML has had the opportunity to be battle-hardened with real world use for decades, including the documentation that comprises its specification. This is huge, because in documentation I can talk about "following links" and "submitting forms" without getting into details about how to construct those HTTP requests, because someone has already taken the trouble of writing that all down, including all the nasty corner cases. I'm lazy--I don't want to define and write down a bunch of rules that solve the same problems reams of experienced people that came before me have already solved.

Furthermore, due to its ubiquity, EVERYONE AND THEIR BROTHER understands HTML and lots of those people can write valid markup without consulting the HTML5 spec (of course, there are also lots who only think they can write valid markup without looking at the spec!). While developers may not be used to using HTML to power APIs, they can nonetheless look at an API response and understand what's going on. This is a huge advantage.

More importantly, HTML is already all over the Web, and there are both human and machine participants consuming it. If I'm starting from an API, then it's entirely possible that someone from the "human-oriented" Web might link to my API, and presto, they can use it, because:

human + browser = client for my HTML API

Similarly, if I'm writing a client, and it can parse HTML (and especially if it can parse RDFa or microdata), then there's a chance it could be pointed at the human-oriented Web and find it can do something useful. But if that client can't parse HTML, then it has no hope of accessing all the existing HTML content on the web.

The phrase here is "serendipitous reuse". The human stumbling onto my API will likely not find it pretty or well-designed, but they may still be able to use it. The programmatic client trolling through web sites will likely ignore half the stuff it downloads, but it still may find something useful (obviously Google has been able to do this). If I find my API is being visited by humans, too, I can add a link to a stylesheet and perhaps download a javascript client, and present them a more usable interface without bothering my programmatic clients that much. Similarly, if my human-oriented website decides it wants to serve programmatic clients too, it can always add semantic tagging in the meantime, and evolve elsewhere.

Tooling Support

Before we get too far into this, let's talk for a minute about the relationship of HTML to XML. Both are flavors of SGML, although the sets of valid documents each can describe are overlapping and distinct. Specifically, there are valid HTML documents that aren't valid XML documents and vice versa, but there are documents that are both valid HTML and valid XML. Then there's XHTML, which is always valid XML but not always valid HTML (depending on the versions). Thus, the relationship is:

Venn diagram showing the relationships of the sets of valid XML, HTML, and XHTML documents

In particular, I find that I can often use markup for my API that actually sits in the intersection of all three. My programmatic clients can ask for application/xhtml+xml, and I can give it to them, and browsers can ask for text/html, and I can give them the exact same bytes with a different Content-Type. If my client wants to use the ubiquitous and available XML parsing and handling libraries out there, great! If they want to be more robust and parse the full subset of HTML, great! And yes, there are full HTML parsing libraries (not XML parsing libraries) in most programming languages, for example: Python, Ruby, Javascript, Perl, PHP, C, and Java.

Now, I will grant that most of these give you a DOM, and not much support above that, so you are endlessly and tediously traversing descendents and siblings in for loops, examining attributes to find what you're looking for. We do have an example, though, that shows manipulating a DOM need not be hard or tedious, and that is likewise ubiquitous: JQuery. And indeed, you can use JQuery selector syntax in other languages, too, like Java or Python. So most of what you actually need for manipulating HTML programmatically in a client already probably exists.

On the server side, we are up to our ears in webserver frameworks that serve up HTML, and IDEs and practices that are set up to optimize developing, testing and debugging them. It's sure nice to load your API up in a browser and play with it. A human plus a browser is a fully-capable client of your HTML API, regardless of what programmatic clients you may be targeting. I can look at the requests and responses over the network and examine the markup in detail in Chrome's developer tools. Many frameworks written for compiled languages like Java can even hotload markup template changes on the fly without recompiling. Plus you can wave a stick and hit thousands (perhaps millions) of developers who are already familiar with all of these technologies.

But what about...?

Domain-specific media types. They're so concise! True; you'd have to work a little harder to represent a blog in HTML than in Atom or RSS, or to represent contact information in HTML rather than in vcard. If there's a domain-specific media type out there for what you're doing, great! Use it--that's what it's for! But I find I work in a world where the application domain is evolving rapidly with new concepts and new features, or where application domains are mixed and mashed up. Many domain-specific media types don't accommodate this well. Imagine trying to write a media type to document Facebook's functionality. You'd end up needing to change the spec daily! That defeats the purpose of having off-the-shelf libraries help you along for the parts that aren't changing much. Or wait--you could build a media type that was so flexible that it could express almost any application...oh.

Bloat. JSON is way more concise, and that really matters for mobile apps. I've heard this so many times that I'm going to have a hard time not being snarky here, so be warned. First off, if representation size or parsing speed is that critical, I'd suggest using a binary format instead, like Protocol Buffers or Avro. What's that? You don't want to use a binary format because it's not human readable? Ah, so you are willing to give up some efficiency to trade off for other things. I see.

But let's get down to some facts here. I often see the following argument presented:

"Here's my sweet JSON representation, only 122 bytes!"
{ "contacts" : [
  { "firstname" : "Jon", "lastname" : "Moore" },
  { "firstname" : "Homer", "lastname" : "Simpson" }
] }
"And here's the bad, old, ugly XML HTML representation. It's 266 bytes, 118% bigger!"
<html>
  <body>
    <ol class="contacts">
      <li><span class="firstname">Jon</span>
          <span class="lastname">Moore</span></li>
      <li><span class="firstname">Homer</span>
          <span class="lastname">Simpson</span></li>
    </ol>
  </body>
</html>
"Ergo, HTML is more bloated than JSON."

There are a couple of observations to make here. First, both of these would fit quite comfortably in a single TCP packet carried in a single 1500 byte Ethernet MTU frame, unless you've got a LOT of headers, in which case, start looking there for bandwidth savings first! So you're not going to notice the difference in practice.

But we're building an HTTP-powered API, right? And we're using compression, right? If I gzip those two files, the gzipped JSON version is 103 bytes and the HTML version is 150 bytes. Now the HTML is only 45% bigger, not 118% bigger. But still bloated, right? Wait, there's more.

These are really small files. Compression algorithms like Huffman coding are based on repeatability of the occurrence of certain strings of bytes, so the compression rate is based on how big and how common those repeated strings are. Well, it turns out that what you call "bloat", gzip calls "compressable." The longer the document, the better it compresses, and the closer gzip will get to the information theoretic minimal representation. Let's see this in action, and with a real API, rather than a toy example. Here's a sample JSON response from the Twitter API, and here's an equivalent XML response, also from the Twitter API. Finally, here's turning it into an HTML-style response.

These samples are, respectively, 44265 bytes (JSON), 64493 bytes (XML), and 40252 bytes (HTML). Wait, what? The HTML representation is the smallest? How is that even possible? I did take the liberty of eliding blank properties, using HTML5 data attributes, and putting true boolean properties as @class values (and leaving off false boolean properties), which I assert are all common HTML idioms. But compare the source gists linked above and decide for yourself.

Now let's gzip them: 7366 bytes (gzipped JSON), 7855 (gzipped XML), 7287 (gzipped HTML). This is only a size difference of 7% from smallest to largest, and even if you don't consider my HTML version comparable, you can see that gzip compression is removing a lot of the differences.

Now, don't get me wrong, JSON is a fine format, and I use it regularly. There are lots of good reasons to use it, but claiming that it is more economical on the wire, while possibly true, is probably not true by enough to make it a deciding factor (and if that really is a deciding factor, you probably want to go to binary formats anyway).

Summary

So what this all boils down to is that HTML offers me quite a lot of convenience as a hypermedia-aware, domain-agnostic media type. I have lots of off-the-shelf tooling, including getting my first client for free (the browser), and from a documentation point of view, between the HTML and HTTP, there's a whole lot of mechanics I don't have to discuss. In fact, if I'm using microdata, I don't even necessarily need to write much down about the particular application domain, at least from a vocabulary point of view. It might even be sufficient to document an HTML API just by listing out:

  • URL of the entry point(s)
  • link relations used (with pointers to their definitions elsewhere!), and important <form> @class values and <input> @names of importance (I think forms need parameterized link relations to do this a little more formally, but we don't quite have those yet)
  • pointers to the microdata definitions of importance (again, elsewhere).
That's not a lot to have to write down.