[PATCH] httprepo: long arguments support (issue2126)

Laurens Holst laurens.nospam at grauw.nl
Tue Mar 29 08:12:55 CDT 2011


Op 29-03-11 13:47, Steven Brown schreef:
> On 29 March 2011 01:59, Augie Fackler<durin42 at gmail.com>  wrote:
>> On Mon, Mar 28, 2011 at 12:58 PM, Laurens Holst<laurens.nospam at grauw.nl>  wrote:
>>> Op 28-3-2011 19:38, Augie Fackler schreef:
>>>> +1, I've suggested this in the past and it sounds reasonable. We
>>>> should also make sure that any response to a request that used headers
>>>> sets appropriate cache-control headers to avoid potential GET caching
>>>> issues.
>>> Actually I think you should use the Vary header for that:
>>>
>>> Vary: X-Hg-Changesets
>>>
>>> Should do the trick.
>> Ah yes. Always forget about that one.
> I don't understand why we want caching. In general, Mercurial could return
> a different response for the same request. For example:
> 1) Client requests heads.
> 2) New head is created on the server.
> 3) Client requests heads again.
>
> What am I missing?

Short answer:

A reason to want caching is to reduce server load and improve response 
time. HTTP caching is flexible enough to deal with the above scenario.

Also it is not suggested to add caching in the above message, just that 
a Vary header should be added to make sure that if any caching is 
configured by the server admin, it will happen correctly.

Long answer:

HTTP caches don’t just blindly return the last result for the same URL. 
First, it only returns a cached copy if one of three caching headers 
were set by the server (so no unsolicited caching is done by HTTP, see 
section 13.4), and if the method and certain headers match. Which 
headers match is indicated by the Vary header on the cached response, 
and iirc some headers are also included by default.

Now Mercurial itself does not currently cache responses to its http 
requests at all. However inbetween Mercurial and the repository there 
may be caching proxies. These may be either provided by say, the ISP, 
but more typically they are installed on the server to reduce load. For 
highly trafficked Mercurial repository servers this is useful functionality.

There are three caching mechanisms in HTTP:

1. An expiry time based one, where you say ‘for the next 5 minutes use a 
cached copy’. This one you definitely don’t want to use for the wire 
protocol, but for hgweb browsing or feeds it is useful.

2. A last modified time based one, where it asks the server if it has a 
newer copy before using the cached one. This one is less efficient than 
the first, as it does not prevent the request entirely, but it does 
avoid generating and transferring the payload. The Mercurial wire 
protocol could use the mtime of the .hg directory for this.

3. A mechanism called ‘etags’ which is pretty similar to 2, except that 
it is more general-purpose and more powerful, e.g. it can encode server 
version or configuration information.

Using the 2nd caching mechanism, Mercurial server load could be severely 
reduced, because for many requests all it has to do is check a time 
stamp instead of going into the repository storage and comparing heads etc.

If a server admin wants to configure this kind of caching, if Mercurial 
does not set a Vary header correctly, this will fail. Of course the 
admin could also manually add that header, and there may be other 
reasons that make it fail, but I think it’s good practice to send this 
Vary header to make it easier to add caching.

~Laurens



More information about the Mercurial-devel mailing list