Saturday, September 1, 2012

Resources and Query Parameters

[Editor's note: This is a cross-post from a Tumblr entry; I started to write it as a quick note because someone was wrong on the Internet, but by the time it was done, it was long enough to be a blog post in its own right.]

What kind of string is this?

http://example.com/path?query=foo

Well, it's a URI (Uniform Resource Identifier) as well as a URL (Uniform Resource Locator). But wait, that means the whole string (check RFC 2396) is resource identifier, which means the whole thing identifies a resource (literally). Not just the "http://example.com/path" part.

I often run into folks that think this string identifies a "http://example.com/path" resource that happens to take a parameter, which is understandable, because this is how almost every web framework is set up to implement it (identify your "/path" route, set up your controller, pick off the parameters as arguments).

However, from an HTTP point of view, and especially from a hypermedia point of view, this isn’t right. HTTP (the protocol) treats everything from the path onward as an opaque string--it shows up as a single token on the Request Line. The whole thing (query included) is used as the key for an HTTP cache. In fact, the only difference between a URL like "http://example.com/path?query=foo" and one like "http://example.com/path/query/foo" is that the former is not cacheable by default if it comes from an HTTP/1.0 origin server. That's it. And even that can be overridden with explicit Cache-Control or Expires headers.

From a hypermedia client point of view, you don't care which style of URL is used. Sure, you might have to construct your HTTP request slightly differently if there are query parameters involved, but that's mechanical--no semantics involved, just syntactically parsing the URL to figure out how to GET it. The only reason to prefer one over the other is purely stylistic; most modern web frameworks can pluck arguments out of a path as easily as they can out of query parameters.

Remember, a hypermedia client never constructs URLs on its own; besides a few well-known entry-points (which it should treat opaquely), it is only using URLs directly fed to it by the server, or constructed according to recipes provided by the server (typically through forms or link templates). This here is probably the main driver for which style you want to use; do you want to use HTML forms, which, for GET, use query parameters, or do you want to use link templates, which tend to use path parameters, stylistically (although they can support query parameters too)?

So in a hypermedia world, there’s really no such thing as a "RESTful" URL structure; a truly RESTful client--one which understands and uses hypermedia affordances--doesn't care.

1 comments:

KevBurnsJr said...

In the real world your statement is even more true.
Responses to requests for URLS containing query parameters are technically specified in RFC2616 as default no cache, but in reality the vast majority of http middleware cache them the same as any other url.