hg slow on large repo

Matt Mackall mpm at selenic.com
Wed May 23 16:53:37 CDT 2007


On Wed, May 23, 2007 at 05:20:32PM -0400, Benjamin LaHaise wrote:
> On Wed, May 23, 2007 at 01:08:09PM -0500, Matt Mackall wrote:
> > Is this a local clone on the same partition? In other words, is it
> > using hardlinks? Or is this over the wire? For going over LAN or fast
> > WAN, you can use --uncompressed.
> 
> It's a local clone on the same partition. Yes, it looks like
> hardlinks are getting used as most of the files under .hg show 2
> links. Part of what seems to be the problem is that there are way
> too many directories and files under .hg -- just doing a du .hg
> takes over a minute cache cold.

It's basically 1:1 with project files. But that doesn't seem to be the problem.

> > How much of the time is clone vs checkout (try time hg clone -U
> > followed by hg update)? 
> 
> hg clone -U takes 17s after a cp -al of the .hg.  An immediately following 
> hg update took XXX.

(11minutes)

Can you please try the untar test? If untar is slow, we know we have
OS or FS issues.

If untar is significantly faster than update, we have a problem.

> > For the update side of things, how much time does it take to untar a
> > comparable tar.gz?
> > 
> > If local, how much time does it take to do a straight cp and cp -al of .hg?
> 
> cp -al of the whole thing takes 4m30s.  cp -a of the whole thing is slow 
> (as in more than 15 minutes).  cp -al of just .hg afterwards took 44s.

Ok. Local clone -U and cp -al .hg are roughly equivalent. So we're
looking at something like 17-44s hot-cache and a few minutes
cold-cache.

I'm assuming this is an ext3 filesystem. Do you have atime updates
disabled? What size is your journal?

> 
> > Tricks exist, but let's figure out what the problem is first.
> 
> This reminds me of a quirk of ext3: if you unpack files in a subdirectory, 
> the allocator will attempt to place the files in the same block group as the 
> directory, which it tries to make different than the parent.  If the file is 
> unpacked in the top level of the directory and subsequently moved into the 
> subdirectory, it will be allocated near the original directory and thus more 
> closely packed on disk.

We write all files in place, so should be taking advantage of this
behavior.

> git seems to get through this much more quickly with -l as it only has to 
> deal with just one large .pack file which can be read sequentially.

What's -l? If you have atime enabled, git will win simply because it
has only one atime to update.

> The hg update rarely peaks 1300KB/s reading from the disk. Does hg
> have a way of packing old history into something that isn't touched
> for typical usage?

No. Nor does git. Unless git is somehow segregating all of the data
needed to checkout tip in some localized subset of the pack, it will
have to visit more or less the whole pack on checkout. And if you've
got multiple packs, it will have to visit them all more or less
randomly to find files last touched in various epochs.

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial mailing list