I’m designing a Web service. The request is idempotent, so I chose the GET method. The response is relatively expensive to calculate and not small, so I want to get caching (on the protocol level) right. (Don’t worry about memoisation at my part, I have that already covered; my question here is actually also paying attention to the Web as a whole.)
There’s only one mandatory parameter and a number of optional parameter with default values if missing. For example, the following two map to the same representation of the response. (If this is a dumb way to go about it the interface, propose something better.)
GET /service?mandatory_parameter=some_data HTTP/1.1
GET /service?mandatory_parameter=some_data;optional_parameter=default1;another_optional_parameter=default2;yet_another_optional_parameter=default3 HTTP/1.1
However, I imagine clients do not know this and would treat them separate and therefore waste cache storage. What should I do to avoid violating the golden rule of caching?
- Make up a canonical form, document it (e.g. all parameters are required after all and need to be sorted in a specific order) and return a client error unless the required form is met?
- Instead of an error, redirect permanently to the canonical form of a request?
- Or is it enough to not mind how the request looks like, and just respond with the same
ETagfor same responses?
First, don’t use semicolons as a delimiter in a query string. You should be using
?to begin a query string and&to delimit variable/value pairs. RFC 3986 doesn’t explicitly say you have to use&, but the vast majority of existing code uses this delimiter because of theapplication/x-www-form-urlencodedprecedent.Second, you’re right, in that parameters in a query string result in a different URI, and thus, as far as caches are concerned, a different resource. Assuming you want optimal caching performance, if you know that an optional parameter has been specified, and its inclusion is unnecessary and does not affect the representation that will be transmitted, you should be making a redirect to a canonical representation that omits the parameter. (i.e., An optional parameter is given with a value that is set to the default value. For example, if you have
http://example.com:80/, you can normalize tohttp://example.com/because80is the default value for the port with HTTP. You can do the same for query parameters since you control the URI space.) If you have parameters included (optional or otherwise) that appear in an order other than the canonical order, you should redirect for that too. A 301 redirect would be preferred if you know that the relationship between URIs will be stable. Otherwise, do a 302/307 redirect as appropriate. I would recommend defining your canonical form the same way that OAuth does: Sort each parameter alphabetically, first by key, then by value. Other normalization operations will also help out here. RFC 3986 has an entire section on URI normalization that will be relevant to you. This technique will really only work for GET, and redirects on PUT/POST/DELETE are not generally recommended.Third, ETags are great, and they provide a huge performance improvement if implemented well by both the client and server. However, it’s unfortunately rare for both sides to do it right. Ditto for Last-Modified. You should pursue these, because the CPU and bandwidth savings are significant when it works, but they are not sufficient on their own. Other headers like Cache-Control are also frequently necessary. It’s worth familiarizing yourself with Section 13 of RFC 2616 if you’re planning on going into great detail on this stuff.
Finally, a word of warning — there is an issue with these redirects you need to be aware of: Clients trying to access your resources may frequently be redirected to other locations. This introduces overhead that only gives you an overall savings if the clients make subsequent requests against the same resource, maintaining state to avoid the subsequent redirect. Unless you’ve open-sourced a reference client implementation that takes advantage of your caching optimizations, you may never benefit from these tweaks.