[PATCH 4 of 6] hgweb: introduce staticimmutable web command

Yuya Nishihara yuya at tcha.org
Wed Apr 5 11:33:23 EDT 2017


On Sat, 01 Apr 2017 00:29:08 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1491021501 25200
> #      Fri Mar 31 21:38:21 2017 -0700
> # Node ID 0b8be3d244585f5a2874821418fce41bf7631f6c
> # Parent  4cec8e88d09775ee6478e307e9dde94af5b9fcfd
> hgweb: introduce staticimmutable web command
> 
> Currently, hgweb serves HTTP responses with an Etag header
> whose value is "W/<mtime>" where <mtime> is the modification time
> of the repo being accessed. The "W" means it is a "Weak validator" -
> something that is an approximation of content but not an identifier
> of unique content. Assuming the client is smart and can cache
> responses (like web browsers), a subsequent HTTP request for this
> URL will send an If-None-Match request header with the value from
> the previous Etag response header. hgweb will compare the INM
> header against what it would serve for Etag and if they match, the
> server fast issues an HTTP 304 Not Modified to prevent serving
> data to the client.
> 
> This is a cache hit and is better than no cache hit. But there is
> overhead from the client sending the HTTP request and waiting for
> the 304 response. Furthermore, the "weak validator" isn't precise
> and can lead to weirdness. For example, https://hg.mozilla.org/
> consists of a pool of servers behind a load balancer. Each server
> has an independent clone of repositories (there is no shared
> filesystem). Therefore, the mtimes on files within repositories
> may be different. This can result in the Etag value differing
> between servers. This can result in HTTP 304 not being issued
> and an HTTP 200 being served instead. And this translates to longer
> page loads. This isn't a theoretical problem: HTTP requests for
> static assets against hg.mozilla.org result in HTTP 200 instead of
> HTTP 304 most of the time.
> 
> There are more effective ways to perform content caching with HTTP.
> 
> One trick is to use "immutable" URLs. Essentially, the content at
> a URL is constant for the lifetime of the URL. When this URL is
> served, the HTTP response basically says "cache me forever." There's
> no conditional HTTP request on next access: if the client has that
> URL cached, it just uses it no questions asked. This avoids the
> network round trip and can result in drastically faster page loads.
> 
> This commit introduces a new web command for processing requests
> for "immutable" static assets. Where the URL path pattern for a
> "static" web command is simply <path>, the URL path pattern for a
> "staticimmutable" web command is "<hash>/<path>" where <hash> is
> derived from content. This makes <hash> a "strong" validator in the
> HTTP RFC sense and allows us to leverage aggressive caching
> techniques. Strictly speaking, the order in the URL violates
> REST best practices: ideally <hash> would come after <path>.
> However, a lot of tools (including browser devtools) use the
> final path component to label a resource. A hash is not a useful
> label. So, we invert the order so the final URL path component
> is friendly. This makes no difference to caching behavior. But it
> will upset REST zealots in their ivory towers. To them, I saw
> practically wins over purity.

If I understand your proposal, a common trick is to append "?<hash>" to
the URL so we can still offload static contents to the frontend server.

Can't we compute the <hash> from Mercurial version (and maybe some config
values)?


More information about the Mercurial-devel mailing list