Saturday, January 24, 2009
Business Cases and Cloud Computing

I just read a very interesting article by Gregory Ness on that talks about some of the technology trends behind cloud computing. One key quote:

Automation and control has been both a key driver and a barrier for the adoption of new technology as well as an enterprise’s ability to monetize past investments. Increasingly complex networks are requiring escalating rates of manual intervention. This dynamic will have more impact on IT spending over the next five years than the global recession, because automation is often the best answer to the productivity and expense challenge.

One other cited link is to an IDC study that includes the following graph:

Graph showing that 60% of the total cost of ownership (TCO) for a server over a 3 year lifetime comes from staffing costs.

Note that staffing accounts for 60% of the cost of maintaining a server over its lifetime. Cloud infrastructure services like Amazon EC2 would really only save an enterprise data center on hardware setup / software install costs, which are probably, in terms of staffing, a small amount of staff time for a given server. Actually administering the server once it is running is really the bulk of the cost, and that won't go away on EC2 -- you'll still need operations staff to provision/image cloud infrastructure. EC2 makes sense if the economies of scale of AWS are such that they can achieve a lower operational cost for that other 40% than you can, or if there is a business / time-to-market value proposition that makes sense in being able to provision hardware on EC2 more rapidly than we can acquire and install hardware yourself.

Given the huge economy of scale that the large cloud providers have--tens of thousands of servers, it is going to be hard to get your costs for that 40% lower than what they can achieve with their existing infrastructure automation and ability to purchase hardware in bulk, especially for a startup company whose hardware needs are initially modest. Let's guess that there's a 33% markup on cost for EC2, so when you are getting charged $0.10 per CPU hour, it's really only costing them $0.075. Let's assume a 75% experience curve on infrastructure (meaning, once you have doubled the number of servers you have deployed, the last server costs only 75% of what the halfway point was).

By one estimate, Amazon has 30,000 servers. Now let's work backward (1/0.75 = 1.33): at 15,000 servers, their cost was $0.075 * 1.33 = $0.9975. At 7500 servers, their marginal cost was $0.9975 * 1.33 = $0.13. In other words, you'd have to be planning to deploy 15,000 servers in order to have a hope of getting your marginal cost under what they'll charge you retail.

(I think this is actually a conservative estimate: the experience/learning curve for infrastructure deployment is probably steeper than 75% due to existing hierarchical deployment patterns and a product (provisioned servers) that lends itself well to automation. Also, due to the high barrier to entry for cloud computing in terms of number of servers you need to be competitive, they can probably get away with charging an even higher markup).

One corollary of this is that if you are currently running a data center with far fewer servers (i.e. the hardware is a sunk cost), you might actually be better off turning your data center off and leasing from Amazon. Now of course, there are some things (customer credit card data, extremely sensitive business information) that you just wouldn't be willing to host somewhere outside your own data center. But that's probably a very specific set of data--host that stuff and lease the rest in the cloud, particularly if you can get adequate SLAs from your cloud vendor.

So that deals with the 40% of the TCO for a server that isn't staffing. How do you cut costs on the other 60%?

You won't really be able to make a dent in that 60% until you get not just to fully automated infrastructure provisioning, but until you get to fully automated software deployment and provisioning. This is not possible until you get to standardized computing platforms with specific functionality that are scale-on-demand, like Akamai NetStorage, Amazon S3/EBS/SQS/SimpleDB, and Google AppEngine. These are known as "Platform-as-a-Service" (PaaS) offerings.

There's a similar experience curve argument here: you could spend internal development time here to set up some kind of application deployment framework, but you'd essentially have to be willing to build and deploy within orders of magnitude the number of different apps as the Google App Engine team in order to get your costs under what Google will charge you. Unless you are in the business of directly competing with them in the PaaS market, you might as well buy from them and focus your energy on providing your unique business value, not software or hardware infrastructure. [Editor's note: this was something Matt Stevens said to me a while ago, and it wasn't until I went through the mental exercise of writing this article that I actually got it].

Yesterday I implemented (not prototyped) a service in Google App Engine in about 6 hours that would cost around $400 per month (according to their recent pricing announcements) if projected usage were more than double what it is now. I estimate this would require at least 10 database servers just to host the data in a scalable, performant fashion, nevermind the REST data interface (webnodes) sitting in front of it. On Amazon EC2, that'd be $720 per month on your small instances (assuming those were even beefy enough), and per the experience curve argument above, it's probably way more than that in our data center. And that's not counting any of the reliability/load balancing infrastructure.

So my open question is: how, as a software developer, can you justify not building your app in one of these cloud frameworks?

Monday, January 12, 2009
Websites are also RESTFul Web Services

I have been reading the Richardson and Ruby book RESTful Web Services and recently had an epiphany: if you design a RESTful web site it is also a RESTful web API. In this post I'll show exactly how that works and how you can use this to rapidly build a prototype of a modern web application.

First of all, let's start with a very simple application concept and build it from the ground up. Let's consider a simple application that lets you keep a list of favorite items. The resources we'll want to model are:

  • a favorite item
  • a list of a user's favorite items

We'll assign the following URLs:

  • /favorites/1234 for favorite item with primary key 1234
  • /favorites is the list of everyone's favorite items
  • /favorites/-/{owner} (or its URL-encoded equivalent) is the list of items belonging to our friend Joe (this is a URL format inspired by Google's GData protocol).

Now, we'll support the following HTTP methods:

  • You can GET and DELETE an individual item (we could allow PUT if we wanted to allow editing, but we'll keep the example simple)
  • You can create a new favorite item with a POST to /favorites.
  • You can GET the list of a user's favorites.

Ok. Now we want to rapidly prototype this so we know if we have the resources modelled correctly. Fire up your favorite web application framework (Ruby on Rails, Django, Spring, etc.) and map those URLs to controllers. Now most of these frameworks let you fill in implementations for the various HTTP methods. We'll make a minor simplification and allow "overloaded POST" where we allow passing a URL parameter to POST to specify PUT and DELETE (e.g. "_method=DELETE"). We can implement the proper HTTP method but we'll allow you to use POST to do it too; browsers and some Javascript HTTP implementations can only do GET and POST.

Ok, now a funny thing happens: it you render an HTML response for everything, you can start playing with your API in your browser! In particular, when we render the list of items, we will naturally put the text of those items on the page, but we can also throw the following HTML snippet at the top of the page:

<p>Add a new favorite:</p>
<form action="/items" method="post">
  <input type="text" name="itemname"/>
  <input type="submit" value="Add"/>

We can also add the following form after each item's text:

<!-- use a specific item's URL for the action -->
<form action="/items/1234" method="post"/>
 <input type="hidden" name="_method" value="DELETE"/>
 <input type="submit" value="Delete"/>

which gives us this ugly beastie:

A Few of My Favorite Things:
Add a new favorite:
  • raindrops on roses
  • whiskers on kittens
  • bright copper kettles
  • warm woolen mittens
  • wild geese that fly with the moon on their wings

Now, one key item is that when you render the result page for adding an item, you can send a 201 (Created) response and say something like "item added", throwing in a link back to the list page. The whole HTML response might be nothing more than:

  <p>I created your item <a href="/items/2345">here</a>.</p>
  <p>All your items are <a href="/items/-/{owner}">here</a>.</p>

We similarly want to render a confirmation page after a DELETE. This makes for an awkward user experience "add, ok, add, ok,..." but you'll notice that the back and forward buttons on your browser actually work without having to rePOST any exchanges.

[Side note: you could, instead of returning a success page, return a 302 that lands you back on the list page, which maybe gets you closer to what you wanted from a user experience, but this is precisely what will break your browser's back button and make you rePOST.]

Now you also have the interesting property that all the links on your site are safe (without side effects) GETs, and all the buttons are potentially destructive (write operations of one sort or another). I say only "potentially" because you might have a search form with action="get" to do a query, and not all of your POST/PUT/DELETEs will actually change anything.

At any rate, at this point, you have a functionally working website that someone could use, if somewhat awkwardly. Plus, if you have my frontend aesthetic design sensibilities, your users will have the pleasure of suppressing a gag reflex while using your site.

So let's spruce this up a little bit. Now, on the HTML page for the list of favorites, we can apply some Javascript. At a first blush, we can hide the delete buttons until a mouseover, which cleans things up somewhat. But the real magic happens when we attach an AJAX event to the delete buttons. Now the script can actually do the very same POST that the form would have done, and then check the HTTP status code, removing the item text from the DOM on success.

Suddenly, the user never leaves that list page, and we haven't had to change any of the rest of the API -- just the HTML representation of that list page. The AJAX call doesn't care if it gets HTML back (in this case), it just cares about the response code. Now we have the nice AJAXy experience we would expect, but oddly enough you still have a website that will work for people with Javascript disabled.

The last step towards finishing out your API is probably simply to make structured versions of your representations available (e.g. JSON or XML formats like Atom) with an optional parameter like "?format=json". Now all of your client-side functions can call URLs with the appropriate format on them and get well-structured data, and everyone else gets HTML.

Well, I guess that's the second to last step. You probably actually want to apply some graphic design and CSS to your site too...