Wednesday, November 16, 2011

RESTful Design - Benefits, Patterns

This captures odds-n-ends around RESTful design - why bother, what are the benefits, what are some patterns, etc. This is written in a terse "talking points" style, with most content either paraphrased or explicitly copied from the footnoted links at end of post; I've added some of my thoughts here and there.
An opening thought around what might be the benefit to understanding and leveraging aspects of RESTful design: since REST describes the way the web works, and the web is the single most scalable application ever known, we might do well to understand and embrace aspects of RESTful style.
REST describes a Resource-Oriented Architecture (ROA): the web is based on resource exchange, not on sending commands.
Selected excerpts from Roy Fielding's thesis[1]:
REST provides a set of architectural constraints that, when applied as a whole, emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems.
The central feature that distinguishes the REST architectural style from other network-based styles is its emphasis on a uniform interface between components. By applying the software engineering principle of generality to the component interface, the overall system architecture is simplified and the visibility of interactions is improved.
What makes HTTP significantly different from RPC is that the requests are directed to resources using a generic interface with standard semantics that can be interpreted by intermediaries almost as well as by the machines that originate services. The result is an application that allows for layers of transformation and indirection that are independent of the information origin, which is very useful for an Internet-scale, multi-organization, anarchically scalable information system. RPC mechanisms, in contrast, are defined in terms of language APIs, not network-based applications.
HTTP is not designed to be a transport protocol. It is a transfer protocol in which the messages reflect the semantics of the Web architecture by performing actions on resources through the transfer and manipulation of representations of those resources. It is possible to achieve a wide range of functionality using this very simple interface, but following the interface is required in order for HTTP semantics to remain visible to intermediaries.
Principles
as per Tilkov[2]:
  • Everything has its own URI as an identifier
  • Link things together
  • Use a set of standard methods (in the manner they're intended)
  • Resources have multiple representations
  • Assume stateless communication
Benefits from Principles
  • Identifiers: 
flexibility, extensibility: bookmarkable, pass between different apps, facilitate new mashups; support versioning (version # is part of URI)
lowered dev costs, extensibility: familiar programming model (ala browser); apply web-centric security constraints; leverage HTTP redirects; apply different rules to different URIs for logging, statistics, auditing, etc.
ease of evolution: retrofit things like auditing, diagnostics, recovery, undo operations, ...
  • Linking:
scalability: paging via links in the face of many results
lowered dev costs: symmetric, consistent, understandable, maintainable, extensible codebase; familiar programming model (as browser). Clients can "discover" the entire information space dynamically, no need to hardwire drill-down URLs that will later break.
lowered dev costs, extensibility: guide "next valid transitions"; encapsulate URI details via rel (relations) attribute, no need for out-of-band document (WADL, WSDL)
ease of evolution: self-describing server can evolve without breaking clients
  • Standard Verbs ("Uniform Interface"):
scalability: can leverage (and not get burned by) existing web infrastructure (proxies, gateways, etc...crawlers, etc.)
reliability: GET and HEAD are safe; PUT and DELETE are idempotent - thus clients can resend requests as needed (except for POST - but see patterns below for workarounds). Cache intermediaries can "determine the cacheability of a response because the interface is generic rather than specific to each resource. By default, the response to a retrieval request is cacheable and the responses to other requests are non-cacheable." (Fielding, sec. 5.2.2)
lowered dev costs: no need to invent a new protocol for every application (ala WS).
extensibility: re-use of testing tools/techniques, interoperability between new apps and existing clients, etc.
value-add: facilitate intranet with searchable resources (i.e. crawlers can index GETs without deleting your database...).
  • Multiple Representations:
ease of evolution: support versioning for backwards compatibility (e.g. application-custom MIME types)
flexibility: client references are not coupled to a particular representation
  • Stateless:
scalability: facilitates load balancing, distributed caching, clustering, parallel processing and pipelining
reliability: failover
flexibility, extensibility: decoupled from clients
visibility: diagnostics are transparent since each request is self-contained
  • All of above used together
easier to combine different services (interop)
more consistent coding patterns, better redundancy, faster training/learning curves, faster evolution
symmetric, understandable, maintainable codebase
From Fielding, section 5.3.1: "(RESTful constraints) allow intermediaries - proxies, gateways, and firewalls - to be introduced at various points in the communication without changing the interfaces between (client and server), thus allowing them to assist in communication translation or improve performance via large-scale, shared caching. REST enables intermediate processing by constraining messages to be self-descriptive: interaction is stateless between requests, standard methods and media types are used to indicate semantics and exchange information, and responses explicitly indicate cacheability."

Patterns

Scalability
  • Caching - leverage validation, expiration, etc so response transfers data only when it's changed; server-side caching to minimize repeating expensive computations and/or to handle increased demand from multiple clients
  • Cache control - server specifies which responses are cacheable; client has option to re-use the cacheable data
  • Response code 409 - to leverage optimistic locking patterns
  • Asynch request pattern - POST a query that is costly on server side; client receives a "future" in Location header, later does a GET to that URI (404 means not done yet...)
  • Header information - specify what encoding is acceptable, server can compress e.g. into gzip to save bandwidth
  • Instead of sessions (impacts scalability), make the shopping cart a resource
  • Provide "collection" resources - coarse-grained interactions
  • Provide paging of large results - e.g. 20 at a time - with links on NEXT and PREVIOUS, etc.
    • provide this link in a header to enable linking from non-text media types, e.g. an image[7]
    • use this "header link" pattern for other linking needs, e.g. providing the "next valid state transitions" (i.e. what valid things can client do from here)
  • Transactional behavior
    • make the txn a resource. GET txn, do stuff to it, finally PUT it at the very end
    • use BASE and compensating txns (PUT, DELETE, POST) as needed
    • server provides links that facilitate compensating actions
  • Conditional GET: response has ETag and/or Last-Modified set; client later requests same resource with header including Etag and/or Last-Modified value as value for If-None-Match and/or If-Modified-Since, respectively; then server decides if resend is needed ("validation"). If not, response code is 304.
  • If you must use cookies, store all app state on client side (i.e. don't use a session ID pointing to data on server) - else, you'll sacrifice scalability
  • Leverage intermediaries - using HTTP as intended in a RESTful style facilitates interoperation with network components that provide load balancing, caching, security policies, etc. As per Fielding: Within REST, intermediary components can actively transform the content of messages because the messages are self-descriptive and their semantics are visible to intermediaries.
Design/Coding
  • New URIs are created by server, returned to server via the Location header after a POST creates it
  • Version the service with URI - /v1/service/resource, or even as part of the host - v1.myservice.twc.com, v2.myservice.twc.com, etc.
  • Keep in mind that HTML5 will support PUT/DELETE. But not all firewalls allow these through...so, tunnel the method using header or hidden form field, or just use XHR
  • Provide canonical representations (text/plain, HTML) - supports easier debugging, scraping
  • Get around idempotence of POST:
    • instead of retry when uncertain about success, do a PUT
    • Post-Once-Exactly: to get around non-repeatable POSTs - GET returns a server-side link representing a resource not yet created, then client POSTs to that URL to create new resource; subsequent POSTs to the same resource URL return 405 (not allowed).
  • "Conditional PUT (POST)": before submitting a large amount of information to server that might not be able to handle it at the moment - PUT without resource but include Content-Length and Expect headers. If response has same code as Expect value, client proceeds, else it does not.
  • Use POST to support large queries, to get around length limits imposed by servers, clients and proxies. However this results in loss of cacheability if response is sent synchronously; instead, use the async request pattern from above - server returns new resource with 201 response code, client then GETs the answer to the request.
  • Content negotiation - client uses accept, accept-encoding, accept-language headers; vary and/or location headers in response. This supports "late binding" in determining content representation as function of request.
  • Evaluate extensibility vs visibility if considering "code on demand" (applets, javascript) - while this simplifies clients (extensible at runtime), it reduces visibility (maintenance, understandability, diagnostics, ...)
References:
[1] Fielding, Roy Thomas. Architectural Styles and the Design of Network-based Software Architectures. Doctoral dissertation, University of California, Irvine, 2000:http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
[2] Stefan Tilkov presentation (video): http://www.parleys.com/#st=5&id=1397
[3] Richardson, L. & Ruby, S. (2007) RESTful Web Services. Sebastopol, CA:O'Reilly Media, Inc.
[4] Allamaraju, S. (2010) RESTful Web Services Cookbook. Sebastopol, CA:O'Reilly Media, Inc.
[5] HATEOAS, the scary acronym: http://css.dzone.com/articles/hateoas-scary-acronym
[6] RESTful Web Services: http://imyousuf-tech.blogs.smartitengineering.com/2011/02/restful-web-services.html
[7] IETF RFC 5988 - Web Linking: http://tools.ietf.org/html/rfc5988

No comments:

Post a Comment