I’m trying to figure out the best HTTP headers to send for four use cases. I’m hoping to come up with headers that do not depend on user agent / protocol version sniffing but I’ll accept that if nothing else fits. All URLs are fetched through fully custom handler so I can select all headers as I like, this is all about intermediate proxies and user agents. If possible, this should be compatible with both HTTP/1.0 and HTTP/1.1 clients. If multiple solutions exists, the best one will be the shortest one when sent over the wire.
Static public content
All “Static public content” is stuff that HTTP is really all about: if the URL is the same, the content is the same. I can do this easily: for example, I put user profile icon into http://domain.com/profiles/xyz/icon/1234abcd where “1234abcd” is the SHA-1 of the file contents of the icon. If I change to icon in the future, I’ll create a new URL and and modify all existing referrers that should use the new icon. What are the best headers to declare that this may be cached forever and may be shared? I’m currently thinking something along the lines:
Date: <current time>
Expires: <current time + one year>
Is this enough to allow caching by user agents and proxies? Do I need Last-Modified or Pragma?
Static non-public content
All “Static non-public content” is stuff that is static but may not be available to everybody. In fact, this content will be available only to selected logged in users (session is kept with session cookie holding session UUID). If the URL is the same, the content is the same. However, the response is not public. An use case could be an image shared to selected friends in a social network service. I’m currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=<huge number>, s-maxage=0
Is this enough to allow caching by user agents and and disable proxies? Do I need Pragma?
Volatile public content
All “Volatile public content” is stuff that is volatile and available to everybody. Something like frontpage of http://slashdot.org/ when not logged in. The intent is to allow rapidly updating content in a non-changing URL. Note that I do NOT want to break the user agent history mechanism (that is, clicking something from a volatile page and then hitting the back button should not result in fetching the volatile page from the server — however, clicking a link that goes to front page should fetch the resource from the server). I’m currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Is this enough to prevent caching but to allow history mechanism (back button)? I know that if I send Cache-Control: no-store, must-revalidate I can force reloading but this is not what I want because that will break the back button, too. Do I need Last-Modified or Pragma?
Even though this is public, it probably does not make sense to allow intermediate proxies to cache this because it’s volatile.
Volatile non-public content
All “Volatile non-public content” is stuff that is volatile and not available to everybody (private). Something like frontpage of http://slashdot.org/ when you are logged in. The intent is to allow rapidly updating content in a non-changing URL. Note that I do NOT want to break the user agent history mechanism (that is, clicking something from a volatile page and then hitting the back button should not result in fetching the volatile page from the server — however, clicking a link that goes to front page should fetch the resource from the server). I’m currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Is this enough to prevent caching but to allow history mechanism (back button)? Do I need Pragma?
Things that still need testing with my suggested headers:
- Verify that private content will not be leaked through HTTP/1.0 proxies.
- Verify that caching works correctly in proxies.
- Verify that caching works correctly in user agents.
- Verify that user agent history mechanism works in user agents (all cases).
- Verify that following a link to a volatile page fetches fresh content from the server.
- Verify all the results when using HTTPS instead of HTTP.
I’ll answer my own question:
Static public content
Rationale: This is compatible with the HTTP/1.0 proxies and RFC 2616 Section 14: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
The
Last-Modifiedheader is not needed for correct caching (because conforming user agents follow theExpiresheader) but may be included for the end user consumption. Including theLast-Modifiedheader may also decrease the server data transfer in case user hits the Reload/Refresh button. IfLast-Modifiedheader is added, it should reflect real data instead of something invented up. If you want to decrease server data transfer (in case user hits Reload/Refresh button) and cannot include realLast-Modifiedheader, you may addETagheader to allow conditional GET (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26). If you already includeLast-Modifiedalso addingETagis just waste. Note thatLast-Modifiedis clearly superior because it’s supported by HTTP/1.0 clients and proxies, too. A suitable value forETagin case of dynamic pages is SHA-1 of the contents of the page/resource. Note that usingLast-ModifiedorETagwill not help with the server load, only with the server outgoing internet pipe / data transfer rate.Static non-public content
Rationale: The
DateandExpiresheaders are for HTTP/1.0 compatibility and because there’s no sensible way to specify that the response is private, these headers communicate that the response may not be cached. TheCache-Controlheader tells that this response may be cached by private cache but shared cache may not cache the response. Thes-maxage=0is added becauseprivatemay not be supported by all proxies that supportCache-Control(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 – I have no idea which proxies are broken). Themax-ageis set to value of60*60*24*365(1 year) because the HTTP/1.1 specification does not define any upper limit for this parameter, I guess that this is implementation dependant. TheExpiresheaders SHOULD be limited to one year in the future, so using the same logic here should be okay. TheVary: Cookieheader is required because the session that is used to check if the visitor is allowed to see the content is transferred in a cookie; because the returned response depends on the cookie value the cache may not use cached response if cookie header is changed.I might personally break the last part. By not including the
Vary: Cookieheader I can improve caching a lot. For example: I have a profile image athttp://example.com/icon/12which is returned only for selected authenticated users. I have a visitorXwith session id5f2and I allow the image to that user. VisitorXlogs out and then later logs in again. NowXhas session id2e8stored in his session cookie. If I haveVary: cookie, the user agent ofXcannot use the cached image and is forced to reload this to its cache. Because the content varies by Cookie, a conditional GET with last modification time cannot be used. I haven’t tested if usingETagcould help in this case because in that case, the server response would be the same (match the SHA-1ETagcomputed from the contents of the response). Be warned that Internet Explorer (at least up to version 9) always forces conditional GET for resources that includeVary: Cookieeven if suitable response were already in cache (source: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx). This is because internal cache implementation of MSIE does not remember which Cookie it sent the first time so it cannot know if the current Cookie is the same one.However, here’s an example of a problem that is caused by dropping the
Vary: Cookieheader to show why this is indeed required for technically correct behavior: see the example above and imagine that after X has logged out, visitor Y logs in with the same user agent (the user agent may have been restarted between X and Y, it does not matter). If Y views a page that includes a link tohttp://example.com/icon/12then Y will see the icon embedded inside the page even though Y wouldn’t be able to see the icon if X had not been using the same user agent previously. In my case I don’t consider this a big enough problem because Y would be able to access the icon manually by inspecting the user agent cache regardless of possibly addedVary: Cookie. However, this issue may prevent Y from noticing that he wouldn’t technically have access to this content (this may be important e.g. if Y is co-authoring the content). If the content is considered sensitive, the server must sendno-storeregardless of the problems caused by thisCache-Controldirective.Here too, adding
Last-Modifiedheader will help with users hitting Reload/Refresh button (see discussion above).Volatile public content
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The
Last-Modifiedtime is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If theLast-Modifiedcannot be used,ETagmay be used as a replacement (see discussion above). It’s critical to useLast-Modifiedto allow conditional GET with HTTP/1.0 compatible clients.If the content may be delayed even slightly, then
Expires,max-ageands-maxage[sic] should be adjusted suitably. For example, adding 5 seconds to those might help a lot for highly popular site, as suggested by symcbean’s answer. Note that unlike conditional GET, increasing the expiry time will decrease server load instead of just decreasing server outgoing data traffic (because the server will see less requests in total).Volatile non-public content
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The
Last-Modifiedtime is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If theLast-Modifiedcannot be used,ETagmay be used as a replacement (see discussion above). It’s critical to useLast-Modifiedto allow conditional GET with HTTP/1.0 compatible clients. Also note thatCache-Controlmust not includeno-cache,must-revalidateorno-storebecause using any of these directives will break the back button in at least one user agent. However, if the content the server is transferring contains sensitive material that should not be stored in permanent storage, theno-storeflag MUST be used regardless of breaking the back button. Warning: note that the use ofno-storecannot prevent sensitive material ending up on the hard disk without encryption if the operating system has swapping enabled and the swap is not encrypted! Also note that usingno-storemakes very little sense unless the connection is encrypted (HTTPS/SSL).