|
HTTP Caching is bretarded: Or, how I learn to stop worrying and accept that 'no-cache' actually does cache. by Acidus at 12:47 am EDT, Aug 24, 2008 |
HTTP Caching is now added to my growing list of things that are Bretarded. Behold RFC2616: no-cache If the no-cache directive does not specify a field-name, then a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.
In other words, HTTP responses with a no-cache directive will actually be cached by downstream web caches. However when subsequent requests for that resource come into the cache, the cache must send a conditional GET to the original web server to check if the response it has cached is ok to serve. So no-cache actually means cache, but revalidate. ... ok... so what about the must-revalidate directive? must-revalidate Because a cache MAY be configured to ignore a server's specified expiration time, and because a client request MAY include a max- stale directive (which has a similar effect), the protocol also includes a mechanism for the origin server to require revalidation of a cache entry on any subsequent use. When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. (I.e., the cache MUST do an end-to-end revalidation every time, if, based solely on the origin server's Expires or max-age value, the cached response is stale.)
Great, so must-revalidate actually means the cache must send a conditional GET to the original server to revalidate the cached respoinse, but only if that response is stale. IF the cache still thinks the response is "fresh" it can serve a cached response regardless of the "must-revalidate" header. Welcome to the fucked up world of HTTP caching! Of course, all this craziness is based on the premise that User-Agents can tell caches to give them stale resources. Which was probably a fairly bad idea in the mid-90s "the web is a series of static documents connected by hyperlinks" view of the world, and is an utterly horrible idea in the Web 2.0 view of the world. There are absolutely no good comprehensive resources that explain HTTP caching directives, cache hierarchies, resolving HTTP/1.0 and HTTP/1.1 directives, etc. Where is a 96 page $39.99 O'Reilly book when you need one? |
|
RE: HTTP Caching is bretarded: Or, how I learn to stop worrying and accept that 'no-cache' actually does cache. by dre at 5:18 am EDT, Aug 24, 2008 |
Acidus wrote: There are absolutely no good comprehensive resources that explain HTTP caching directives, cache hierarchies, resolving HTTP/1.0 and HTTP/1.1 directives, etc. Where is a 96 page $39.99 O'Reilly book when you need one?
You mean besides RFC 2616, which is free and the "official" source, right? I could literally show you at least 4 O'Reilly books and 2 Addison-Wesley books (ones better than yours) that explain this, btw. Did you read about `no-store'? I find this hard to believe. |
|
| |
RE: HTTP Caching is bretarded: Or, how I learn to stop worrying and accept that 'no-cache' actually does cache. by Acidus at 10:23 am EDT, Aug 24, 2008 |
dre wrote: Acidus wrote: There are absolutely no good comprehensive resources that explain HTTP caching directives, cache hierarchies, resolving HTTP/1.0 and HTTP/1.1 directives, etc. Where is a 96 page $39.99 O'Reilly book when you need one?
You mean besides RFC 2616, which is free and the "official" source, right? I could literally show you at least 4 O'Reilly books and 2 Addison-Wesley books (ones better than yours) that explain this, btw. Did you read about `no-store'? I find this hard to believe.
The RFCs are nice but dense. O'Reilly's "definitive" Guide to HTTP is woefully lacking solid information about caching. I had to dig into Squid to learn more about only-if-cache and sibling/child caches. There are many books out there that talk about HTTP caching from a performance point of view, but after Expires, LAst-Modified, and E-Tags they lose detail quickly. Shiflet's book looks interesting, and the parts I've seen on Google Books are no-nonsense and clear. Is that on your list? Yes, I read about no-store. However some things are still unclear. Do modern caching proxies cache URLs with query strings? By default how excessive can they be? What about cookie assignments? Can I use Set-Cookie as a value in Cache-Control to force their caching? Which caches perform transforms and thus pay attention to no-transform? Will they modify the Content-MD5 header? My rant stems from the observation that all the information about what gets cached under what conditions for how long and what trumps certain conditions is fragmented all over the RFCs. And what appear to be contradictions are never addressed. |
|
| | |
RE: HTTP Caching is bretarded: Or, how I learn to stop worrying and accept that 'no-cache' actually does cache. by dre at 9:27 pm EDT, Aug 24, 2008 |
Acidus wrote: The RFCs are nice but dense. O'Reilly's "definitive" Guide to HTTP is woefully lacking solid information about caching. I had to dig into Squid to learn more about only-if-cache and sibling/child caches. There are many books out there that talk about HTTP caching from a performance point of view, but after Expires, LAst-Modified, and E-Tags they lose detail quickly. Shiflet's book looks interesting, and the parts I've seen on Google Books are no-nonsense and clear. Is that on your list?
A search on SafariBooksOnline for `must-revalidate' lists 69 books, 20 of which are O'Reilly books, 1 of which is Shiflett's book, and only 1 of which is an Addison-Wesley book worth mention (Refactoring HTML and I was kidding about it being better than yours). This includes section 7.9.4 in HTTP: The Definitive Guide. There is a blurb somewhere that covers caching for security purposes, although I can't remember what it is now -- and I guess it could have been in the Akamai documentation or similar. Acidus wrote: However some things are still unclear. Do modern caching proxies cache URLs with query strings? By default how excessive can they be? What about cookie assignments? Can I use Set-Cookie as a value in Cache-Control to force their caching? Which caches perform transforms and thus pay attention to no-transform? Will they modify the Content-MD5 header?
You were definitely in the right place with the Squid source code in order to answer these questions. I know that with reverse proxies, they will sometimes append query strings (but never cookies), which often times require mod_rewrite or similar on the server end to interpret properly. Of course a cache server in either direction can cache query strings, cookies -- anything unencrypted in the header is potentially unsafe. The only cache server that performs transforms by default that I know about is RabbIT (http://www.khelekore.org/rabbit/), although I guess the local proxy, webcleaner.sf.net probably does transforms, too. I don't think proxies should or will ever modify the Content-MD5 header, see: http://www.askapache.com/htaccess/speed-up-sites-with-htaccess-caching.html I ran across UsaProxy about a year ago - http://fnuked.de/usaproxy/ - probably from ckers.org, and I saw it mentioned in the newish O'Reilly book, Website Optimization. There may be other ways to do whatever you are trying to do using methods such as found in a usability proxy. |
|
| | |
RE: HTTP Caching is bretarded: Or, how I learn to stop worrying and accept that 'no-cache' actually does cache. by Rattle at 12:22 am EDT, Aug 26, 2008 |
The RFCs are nice but dense. O'Reilly's "definitive" Guide to HTTP is woefully lacking solid information about caching. I had to dig into Squid to learn more about only-if-cache and sibling/child caches. There are many books out there that talk about HTTP caching from a performance point of view, but after Expires, LAst-Modified, and E-Tags they lose detail quickly. Shiflet's book looks interesting, and the parts I've seen on Google Books are no-nonsense and clear. Is that on your list? Yes, I read about no-store. However some things are still unclear. Do modern caching proxies cache URLs with query strings? By default how excessive can they be? What about cookie assignments? Can I use Set-Cookie as a value in Cache-Control to force their caching? Which caches perform transforms and thus pay attention to no-transform? Will they modify the Content-MD5 header? My rant stems from the observation that all the information about what gets cached under what conditions for how long and what trumps certain conditions is fragmented all over the RFCs. And what appear to be contradictions are never addressed.
Squid is so last tech bubble.. It's all about Varnish now. Varnish lets you make caching decisions based on specifics of cookies. For instance, if you have a cookie present for a user session, you can have the proxy fetch the page directly from the application/web server. Yet, still force the proxy to cache the page regardless of what cache control headers the origin server passes it if it doesn't have a certain cookie present, hence reducing the load of generating the same page over and over. You can tell Varnish to do just about any type of caching behavior your twisted fucked up mind can come up with. It's like the difference between being able to shoot yourself in the foot, and crush your balls with a ball peen hammer. RFCs and documentation will lend you no comfort as you walk through this valley of death and despair. The reality within this realm is determined only by the depths of the insanity of web application developers and operations ninja desperately trying to keep servers from starting on fire.... |
|
|
|