Tuesday, July 10, 2012

Using HTML as the Media Type for your API

This content has moved permanently to:


Mike Kelly said...

I wrote a comment here that turned into a short blog post:


Jon Moore said...

@Mike: Thanks for reading! I commented on your response post.

Jon Moore said...

It strikes me I should document some assumptions I'm making:

1. I can't use a domain-specific media type because the my API either spans domains or the domains are evolving too rapidly. So I need a domain-agnostic media type.

2. I want to build a hypermedia API, because I believe that helps me with evolvability. So I want the media type to support links and forms (or parameterized links) natively.

3. I want there to be serendipitous reuse of my API, so I want to primarily use formally standardized formats/protocols/techniques (also because I want to minimize what I have to document).

4. If possible, I prefer having representations that can serve both human and machine clients.

kpobococ said...

Well, this article just added html to my mental list of available API formats =)

Anonymous said...

For mobile apps, every byte counts.

yes JSON is compact compared to HTML, but then most things are. However JSON is very wordy, as its schema less (well sort of).

what you really want is a binary protocol.

*yes really, its not too hard to design and there are many prebuilt ones that are schema less, if that floats your boat

Unknown said...

I do tend to prefer JSON output as it is concise enough and human readable enough for most cases.

But this post does give me some things to consider. Especially if the API is just returning content to be displayed.

What I like about JSON is the ability to de/serialize objects across programming languages. An HTML API would make that more difficult.

Bandwidth is a concern only for the largest of APIs with a huge user base. But cruft is cruft and should be cut.

Andrei Neculau said...

My 4 cents on this:

* http://blog.programmableweb.com/2011/11/18/rest-api-design-putting-the-type-in-content-type/

* no matter if it's XML, JSON, HTML, HAL you are still left with mostly syntax: parse this text into a data-structure of type: integer, string, map, bag, "link class"

* different POVs, but HTML's semantics are mainly presentational: tags like EM, B, P, H1, etc. Whether there is CSS or not, the browser would take those semantics and act upon them from a presentation perspective

* "true" API semantics come not from what exists in the data that is sent, but what it means when it exists. Compare looking at a
- "application/html" document which I can see that it has two ADDRESS tags (if I'm the browser, I know how to render them)
- "application/vnd.shop+html" document which I know that it is defined to have x,y,z, an ADDRESS tag with ID "physical_address", meaning the address where the shop runs the business, and another ADDRESS with ID "return_address", meaning the address where one can do returns (if I'm the API client, and I have to do a return, I know that I will use the second one)

Jon Moore said...

@Anonymous: Re: binary protocols. I agree, which is why I reference ProtoBufs and Avro in the article--those are great options.

Jon Moore said...

@Andrei: Thanks for the link to your article--I enjoyed reading it. I've actually found that "typed" content-types, like XML schema or DTDs have their pros and cons. On the one hand, it makes it easier to write a client, but on the other hand, those clients are less adaptable to change.

Evolvability is all about reducing coupling; you can't break an assumption that isn't made in the first place. Forcing clients to do a little "spelunking" for what they're trying to find makes them far more robust and adaptive (particularly when written against servers operated by someone else).

Unknown said...

Great post Jon,

This reminds me of a couple times in the past where using a 3rd party's api was so awful that I found it easier to just scrape the information off their web site instead.

XPath targeting the css class names made this approach very straight forward as each of these data elements had its own specific styling.

The data was easy to see on the page and therefore was (and should be) easy to access and work with.

Draws some interesting parallels to what you mentioned. I'll definitely be exploring your idea more.

Steven Willmott said...

I think this is great debate and also that it certainly isn't "wrong" to use HTML as an API media type (in a sense everything is right) - but I still don't agree that that it's a particularly good starting point. Most importantly:

- The semantics which you rightly mention are real - BUT they are the semantics of document structure and not of a domain and in the long run its domain semantics which will count. Granted JSON isn't much better than this - but you say "Raw XML has nothing" - sure XML without a schema has nothing, but that's what XML Schemas are for. I'm not saying that's golden either - but ultimately this is where APIs will need to go to be truely machine re-usable.

- When your semantics are about document structure it pushes you to do things which are likely not efficient for machine interacts. I'd rather have the representation be a hotel directly rather than "a document" about a hotel.

- The serendipidy of a human being able to click through an API is nice certainly - but again you're tied to a legacy interaction mode. Given that the human experience of traversing that data isn't likely to be great it seems like a fundamental tension to try to set up a system which should be useful for humans as well as machines.

The tooling for HTML is attractive but as I said on API craft it's primarily oriented to creating good human user experiences - not for good machine access to APIs. Or at least not good to try to mix those things together.

Since I think that ultimately other layers (e.g. proper schemas, descriptions and domain semantics are needed) I just don't see HTML as the best starting point for that.

Jon Moore said...

@Rich: That's exactly the right idea. A "robust" screen scraper needs to be generous about the markup it gets, same idea here, except that we're at least intentionally leaving hints/handles for the clients to find (like those CSS styles you mentioned).

In addition, sometimes you "XPath" to look for an A tag with a particular @rel, or a FORM tag with a particular @class; besides just looking for data.

Andrei Neculau said...

@Jon: "spelunking" (had to google it) is great, and it doesn't go against a semantic media-type: the API that I'm designing right now follows quite religiously the thoughts of https://secure.designinghypermediaapis.com/nodes/fdivisitjqwp

So from that perspective, I see things differently: the lack of a semantic media-type, although increases "spelunking", it also increases assumptions, and it the end the coupling.

With a semantic media-type, you would do a little "spelunking" to decrease coupling
E.g. does the response have property X? if it does, then I know (from the media-type's definition), rather than assume, that it means Y, and that I can do this and that)

Anonymous said...

Ok, HTML gives you a semantic context, but in the end it does not really help because you dont have proper tool support.

I dont see the disadvantages of using the proper XML Tools for this. As in HTML you get parsing for free, but unlike HTML validation (schema) is free as well.

"Brittle" as you call it is failing fast and reliably, so things can be fixed opposed to behaving unpredictably, even on schema consistent changes.

The HTML approach makes the client programmer guess and assume, proper XML (in most cases) generates client models for free.

Now please sell us PHP as a good System Programming Language.

I mean seriously ?

Jon Moore said...

@Andrei: It sounds like you might be talking about media type profiles like XMDP,which I think is compatible with the general idea here.

I'm trying to separate structure and hypermedia controls into one layer (HTML), and application semantics into another (RDFa/microformats/link relations/microdata), rather than lumping them into one semantic media type, like, say, vcard (which dictates structure and semantics).

I think if you had HTML for structure and hypermedia, and the HTML had a HEAD tag with @profile pointing to a specific collection of semantic conventions (i.e. your semantic "type") you get to approximately the same place.

Jon Moore said...

@Anonymous: Schemas work well and are useful for testing, right up until the point you find you need to change them, at which point you're faced with the choice I'm trying to avoid. This is why I say they are brittle--not to mention that at least one of my goals is perhaps have clients interacting with the HTML on the web at large, which I guarantee is not going to be subject to schema. So I want to figure out how to write clients that are robust enough to not require the crutch of a schema. I think this is possible, we'll find out, I guess. :)

Please note that there isn't any guessing for HTML; it has perfectly clear rules for validation. "HTMLParser.parse(s)" is no harder than "XMLParser.parse(s)" or "JSONParser.parse(s)". You say that the HTML makes the client guess and assume, but I say a schema leads the client to make a more problematic assumption, which is that the schema will always govern those resources. I've run into enough rapid development situations where schema would not be appropriate, because the application domain is changing too rapidly.

Anonymous said...

Schemas are in fact not that hard :)

"You say that the HTML makes the client guess and assume, but I say a schema leads the client to make a more problematic assumption, which is that the schema will always govern those resources."

HTML only gives you semantics and syntax, not structure - how does the consumer/client know what is where. The Schema defines the structure, which has to be guessed or defined elsewhere otherwise.

The Client will always have to react on Changes in the structure.
Case A (change is valid in schema, but has not been used before):
non-schema client *maybe* still works or needs adaptation.
schema client just works.

Case B (change alters schema):
both clients have to be changed.

"Magic" clients that guess a lot may work in those scenarios, but are highly incompatible and messy. They also may behave wrong without indication, possibly corrupting the data.

Also nobody prevents you to make your schema xml look like a subset of html.

Jon Moore said...

@Anonymous: The processing model is slightly different, for sure. If there's a schema, then I agree it gets very easy for a client to find what it's looking for. With the approach I'm describing, the client behavior is somewhat less direct; it's more like "ok, let me digest this response entirely, and then decide what I want to do".

On the other hand, consider the "resource inlining" refactorings I showed in this talk. Would a pre-existing schema have accommodated this change? Maybe, maybe not (in my experience, most schemas wouldn't). If the schema did accommodate data being inlined or remote, then I'm suggesting the client behavior is actually pretty similar: "if I can find the data here then X else Y".

My experience with schemas has perhaps been in use cases where they aren't as useful. When the application domain changes rapidly and in unanticipated ways (common during user-facing feature innovation), I've always found the schemas get in the way, from either preventing a change to something we now realize would work better, or from just having to be re-versioned and then needing to handle client migrations.

However, I think we basically agree. In this case, a schema makes clients easier to write, at the expense of tighter coupling (clients can't handle a schema-breaking change). A schema-free approach makes it harder to write clients--no argument--but by definition reduces the number of assumptions the client may make about server behavior.

Which tradeoff you want to make depends entirely on your problem setting and your goals.

Jon Moore said...

Also, I would not characterize the clients as "guessing"; "searching" or "exploring" would be more appropriate. The clients are still reacting to specifics provided by the server: microdata, link relations, semantic structure. They may not know ahead of time where these things will be found, but they can be recognized when they are encountered.

Anonymous said...

I don't really understand the argument for HTML. You need a format for transferring data. Most likely, clients will be written to match whatever format you define - there aren't clients already written to your API. Any format for which parsers are readily available will do - XML and JSON included.

Using HTML will make parsing more tedious for web clients written in Javascript, which is why I prefer JSON. OTOH, if using HTML, you're always bound to a pretty strict and limited schema, whereas you don't have a schema restriction for JSON (unless you use JSON-Schema), and you can always extend XML schemas.

I also don't buy it that HTML is more expressive than JSON or XML. <a href="...">...</a> is IMO harder to read than <link uri="..." /> or <link>...</link>, or "link": "...". But in either XML or JSON you can represent things like "link" { "proto": "http", "host": "...", "port": ..., "path": "..." }, while in HTML you force the client to use an additional parser for this.

Putting <form></form> in whatever you transfer doesn't really make your clients support submitting forms, nor does it make this easier for them. In fact, relying on HTML's form mechanism you make it harder for clients, since submitting forms involves using yet another format - form contents aren't submitted as HTML. Whereas for XML-RPC, SOAP or JSON-RPC both the request and the response use the same format.

Jon Moore said...

@Anonymous: I think you raise some good points here. Yes, form-encoded inputs are a specific media type distinct from HTML, yet most HTTP libraries make form submission no harder than constructing a map of name-value pairs. This has not been a problem for me in practice in the clients I've written.

I also want to be a bit of a stickler here about expressiveness. XML and JSON do not have standard ways of representing links and/or forms, and neither does SGML. You can define conventions for links and forms in XML and JSON, but then you're defining a new media type (e.g. HAL+JSON or, more poignantly, application/xhtml+xml). HTML is the only media type in the IANA standards tree with support for links AND forms, at the moment.

There are certainly efforts underway to standardize other media types, and I would be happy to entertain them when they arrive. Much of the argument presented in this article is a result of the point-in-time state of the various standards bodies.

Sara said...

I always have some trouble doing this kind of thing i am not really good at coding but i am good designing and working with other people around all this area, we always chat if they get whatsapp download

Anonymous said...

Hi Jon,

Thanks for a fascinating article trying to encourage the usage of an existing format we already know and have come to love, namely HTML. The strongest argument you made that convinced me is that we already know HTML is widely supported and works well for the things you try and emphasize. Thank you. Just so you know I go to this article reading another blog on Decoupling UI from HATEOS and they referenced your post here as a reason to use HTML.

ref: http://smizell.com/weblog/2014/html-hypermedia-api-decoupled-ui

Thanks again, and I'm going to trying experimenting with both the things I've found in this article and the other. Have a good day!

Hill said...

It is actually a great and useful piece of information. I am glad that you shared this helpful information with us. Please keep us informed like this. Thank you for sharing.