[issue2092] Entire working copy traversed during pull -u

Jesse Glick bugs at mercurial.selenic.com
Fri Mar 12 17:24:52 UTC 2010


New submission from Jesse Glick <jesse.glick at sun.com>:

On a clone of a very large repository, with a fairly warm disk cache, I run
(Hg 1.5, Python 2.6.4, Ubuntu):

$ hg pull -u
pulling from http://....
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

The pull part completes fairly quickly, but it takes a minute or so to
perform the update. This despite the fact that the new changeset involves
just a plain edit to a single file. Since pull -u is supposed to carry over
any local modification (in this case I had none), and I am using a
straightforward ext3 filesystem with no case-folding considerations, in
principle all Hg really needed to do here was:

1. Verify that this one file was not modified.

2. Update it to the new version as given in the new manifest.

Instead, strace reveals it doing a full walk of the working copy, which is
of course orders of magnitude slower. For example, references to an
arbitrarily picked directory with one file in it, completely unrelated to
the new changeset, include (note also the useless double close()):

fstatat64(5, "TemplateCompletionTestCase", {st_mode=S_IFDIR|0755,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
.....later.....
open("...full path.../TemplateCompletionTestCase", O_RDONLY|O_LARGEFILE) = 5
fstat64(5, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(5, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl64(5, F_SETFD, FD_CLOEXEC)         = 0
getdents64(5, /* 3 entries */, 32768)   = 80
fstatat64(5, "template.cc", {st_mode=S_IFREG|0644, st_size=649, ...},
AT_SYMLINK_NOFOLLOW) = 0
getdents64(5, /* 0 entries */, 32768)   = 0
close(5)                                = 0
close(5)                                = -1 EBADF (Bad file descriptor)

Ideally the Hg test suite would permit a limited number of system calls
related to a given repository or working copy path during a given command.
Using a native instrumentation tool like strace is the strictest way to
enforce this, but satisfactory regression tests might be written using
decorators around Python functions that trigger system calls so long as it
is feasible to enumerate all such functions in use in Hg sources. (For
comparison, in Java it is possible to install a SecurityManager which
records every attempted java.io.File access.)

----------
messages: 12031
nosy: jglick
priority: bug
status: unread
title: Entire working copy traversed during pull -u
topic: 1.5, performance

____________________________________________________
Mercurial issue tracker <bugs at mercurial.selenic.com>
<http://mercurial.selenic.com/bts/issue2092>
____________________________________________________


More information about the Mercurial-devel mailing list