hg slow on large repo

Tim Post tim.post at netkinetics.net
Sat May 26 18:01:36 CDT 2007


On Sat, 2007-05-26 at 16:20 +0200, solo turn wrote:
> what i'm also wondering why the bsd mirror is so slow - this is the
> biggest mercurial rep i know of. click on a change set and you wait
> forever: http://hg.fr.freebsd.org/, as well as on
> http://archives.keltia.net/hg/freebsd-src-head/
> 
> http://hg.fr.freebsd.org/src-head/?cs=d63097f96fb9, a very small
> change-set, takes >15sec to be retrieved. the server itself does not
> make an overloaded impression when clicking other links.
> 
> -solo

Remember that Linux doesn't use memory like other Operating systems.
Apache, SQL servers, etc malloc() much, much, MUCH more than they
actually need and love to eat up contiguous blocks of memory.

Since hg works off of a diff, which also needs contiguous ram, the two
can and often will compete with eachother.

I run http://dev1.netkinetics.net with 30+ HG repos there, some of them
huge. I see this *exact* behavior when the system starts to turn to
dirty paging. Its not HG, its just too many things competing for
contiguous blocks of ram to use. When ext3cow.com got Slashdotted last
week so did I, it was an interesting test.

I found that saving some elbow room by limiting the # of idle children
of other services fixed this quickly.

Its a cost of ( (2 * ((sizeof(current_rev) + (sizeof(last_rev)) /
SYS_PAGESIZE) ? (rev)) per page you view with hgweb, + apache. This is a
'long term cost' (not one that stays in dentry).

On your example, the server did not show signs of being over loaded
because services had cached to stay alive to serve requests quickly. hg
is not a service, it forks on every page load.

What BSD *probably* did was forget to disallow their repos in
robots.txt. Every time Google hits that repo and starts requesting 2
diffs per second, its going to get slow :) Which appears to be the case
at least here : http://hg.fr.freebsd.org/robots.txt

Best,
--Tim

> 
> 
> On 5/26/07, Benjamin LaHaise <bcrl at kvack.org> wrote:
> > On Sat, May 26, 2007 at 02:51:21AM +0200, solo turn wrote:
> > > this mail seems to suggest git works with hardlinks:
> > > http://marc.info/?l=git&m=116370498919078&w=2
> >
> > Only for metadata.
> >
> >                 -ben
> > --
> > "Time is of no importance, Mr. President, only life is important."
> > Don't Email: <zyntrop at kvack.org>.
> >
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial



More information about the Mercurial mailing list