Friday, September 21, 2012
Hackday project: tarpio.us

I saw this tweet from @kellan that got me to thinking:

It would be useful if Tent.io could describe itself in the world of protocols I know, particularly PuSH, XMPP, and SMTP.

I think the notion of a distributed social network is an interesting one, and I've been pondering how it might work. I think I've got a scheme for one, which I'll outline in this post, and I think it might not take that long to throw it together with off-the-shelf open source and a dedicated hack day.

General Principles

Use existing, open standards. This increases the likelihood of finding good open source code to use when building software, ensures that a distributed marketplace of (interoperable!) clients and servers could arise, and increases the likelihood of avoiding platform specificity. Plus, giants have pretty high shoulders. I have limited hack time; I'm not into wheel reinvention.

Central service provider not required. This is pretty fundamental; means there's no central organization that owns/has access to everything ("creepy alert!"), but also that there's no central organization that can be subjected to lawsuits (e.g. Napster[1]).

Users have control over sharing. At least for the initial communication, users need to be able to say exactly who gets to see a piece of content. Of course, once you've shared something with someone else, there's no stopping that other person from sharing it further, but that's what social trust is all about.

Protocol Sketch

TL;DR. S/MIME, public key crypto, Atom.

To expand: everyone runs their own AtomPub server somewhere, or contracts with a service provider to do it on their behalf. Multiple PaaS and IaaS providers have a free tier of service, which is almost certainly more than enough for most users. I get whole tens of visits to my blog every day. And I am not a prolific sharer: looks like I've averaged 1.1 tweets per day since starting to use Twitter. "Roflscale" is probably enough, to be honest. Easy recoverability probably more important than high availability for a single-user server/hosted PaaS app, so I think this could be done for less than $10 a year (most of which is probably just buying a domain name). Given the relatively low traffic, actual friends could pretty easily share infrastructure.

Now, I'm the only one who posts to that Atom feed, and anyone can poll it at any time. The trick is that the content is encrypted, and you specify who gets access to the decryption key. Basically, you generate a random (symmetric) encryption key for the content, then public-key encrypt copies of the content key for all your recipients. Put it all together in one multipart package, upload to, say, S3, and then post a link to it on your Atom feed. Of interest is that neither S3 (the storage) nor the AtomPub server care one whit about the encryption/decryption stuff.

Interestingly, following gets somewhat inverted. You can post content intended for someone who never subscribes to your feed (gee, I hope he notices me!). You can also subscribe to someone's feed who never encrypts stuff for you. But you can see anyone's public broadcasts, of course. Actual "friending" is just GPG key exchange and an addition of the key to appropriate keyrings to represent friend groups/circles.

On the client side, then, it's mostly about subscribing to people's Atom feeds, polling them periodically (for which HTTP caching will really help a lot), cracking open the content encryption keys with your own private key, then interleaving and displaying the content.

That's about it. Definitely possible to layer PubSubHubbub on top to get more realtime exchange; no reason your free tier server couldn't host one of those for you too. Or perhaps just a feed aggregation proxy that would interleave all your subscriptions for you (without needing to understand who any of the recipients were, or even if you were one of the recipients!).

Hack Day Proposal

Roughly, get a simple AtomPub server stood up somewhere. Apache Abdera has a tutorial for doing exactly that, if you can implement the storage interface (MySql/Postgresql/SimpleDB/S3 I think would all work; looks to be just a Map). Front with Apache and some form of authentication module. Alternatively, grab Abdera and retrofit into Java Google AppEngine and use Google federated OpenID for login.

Client side, cobble together a UN*X command-line client out of bash/Python/Ruby/whatever, depending on what open source is out there. It'd be kind of neat to perhaps do commands interleaved on the command line, in MH style. Everybody's got an HTTP client library that can be made to convince the server you're you. AtomPub clients are available in Python (1,2), Java (3,4), Ruby (5,6). GnuPG for the public key implementation. Bouncycastle has a Java S/MIME implementation. Lots of S3 clients out there (7,8).

I think it's entirely possible to have a simple client and server up in a day with a smallish group of folks (5-6). If there are more folks we can also try multiple clients and servers.

Tentative Projectname

Since @kellan started this off by mentioning tent.io, and this is an attempt to slap something together with off-the-shelf stuff, it felt more like a lean-to made out of a tarp than a true tent, so I've snarfed the domain name tarpio.us (which is the closest I could come to something that meant "like a tarp" that lined up with a cheap TLD). Cool part, of course, is that there's no reason tent.io and tarpio.us couldn't interoperate with protocol bridges someday.

If you're interested, hit me up on Twitter at @jon_moore or follow along on GitHub. I'll see when I can free up for a hackday soon!

Footnotes

  1. I'm not advocating violating copyrights here, merely noting that Napster was a P2P system with a single point of failure (an organization that could be sued).

Saturday, September 1, 2012
Resources and Query Parameters

[Editor's note: This is a cross-post from a Tumblr entry; I started to write it as a quick note because someone was wrong on the Internet, but by the time it was done, it was long enough to be a blog post in its own right.]

What kind of string is this?

http://example.com/path?query=foo

Well, it's a URI (Uniform Resource Identifier) as well as a URL (Uniform Resource Locator). But wait, that means the whole string (check RFC 2396) is resource identifier, which means the whole thing identifies a resource (literally). Not just the "http://example.com/path" part.

I often run into folks that think this string identifies a "http://example.com/path" resource that happens to take a parameter, which is understandable, because this is how almost every web framework is set up to implement it (identify your "/path" route, set up your controller, pick off the parameters as arguments).

However, from an HTTP point of view, and especially from a hypermedia point of view, this isn’t right. HTTP (the protocol) treats everything from the path onward as an opaque string--it shows up as a single token on the Request Line. The whole thing (query included) is used as the key for an HTTP cache. In fact, the only difference between a URL like "http://example.com/path?query=foo" and one like "http://example.com/path/query/foo" is that the former is not cacheable by default if it comes from an HTTP/1.0 origin server. That's it. And even that can be overridden with explicit Cache-Control or Expires headers.

From a hypermedia client point of view, you don't care which style of URL is used. Sure, you might have to construct your HTTP request slightly differently if there are query parameters involved, but that's mechanical--no semantics involved, just syntactically parsing the URL to figure out how to GET it. The only reason to prefer one over the other is purely stylistic; most modern web frameworks can pluck arguments out of a path as easily as they can out of query parameters.

Remember, a hypermedia client never constructs URLs on its own; besides a few well-known entry-points (which it should treat opaquely), it is only using URLs directly fed to it by the server, or constructed according to recipes provided by the server (typically through forms or link templates). This here is probably the main driver for which style you want to use; do you want to use HTML forms, which, for GET, use query parameters, or do you want to use link templates, which tend to use path parameters, stylistically (although they can support query parameters too)?

So in a hypermedia world, there’s really no such thing as a "RESTful" URL structure; a truly RESTful client--one which understands and uses hypermedia affordances--doesn't care.