MQ performance on large repo

Adrian Buehlmann adrian at cadifra.com
Mon Mar 1 17:31:46 CST 2010


On 01.03.2010 23:52, Greg Ward wrote:
> On Mon, Mar 1, 2010 at 4:35 PM, Adrian Buehlmann <adrian at cadifra.com> wrote:
>> These line numbers are rather weird.
> 
> That's because I had to hack store.py to get the stack trace.

ok.

>> What version of mercurial is that? Can you please rerun with crew tip?
> 
> Sure.  I've updated to b1339234080e and modifed store.py like this:
> 
> diff --git a/mercurial/store.py b/mercurial/store.py
> --- a/mercurial/store.py
> +++ b/mercurial/store.py
> @@ -244,6 +244,10 @@
> 
>      def _load(self):
>          '''fill the entries from the fncache file'''
> +        import traceback
> +        print "fncache._load() called from:"
> +        traceback.print_stack()
> +
>          self.entries = set()
>          try:
>              fp = self.opener('fncache', mode='rb')
> 
> Now I qrefresh:
> 
> $ hgc --time --prof qref
> fncache._load() called from:
>   File "/home/gward/bin/hgc", line 27, in <module>
>     mercurial.dispatch.run()
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 16, in run
>     sys.exit(dispatch(sys.argv[1:]))
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 30, in dispatch
>     return _runcatch(u, args)
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 47, in _runcatch
>     return _dispatch(ui, args)
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 467, in _dispatch
>     return runcommand(lui, repo, cmd, fullargs, ui, options, d)
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 337, in runcommand
>     ret = _runcommand(ui, options, cmd, d)
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 501, in _runcommand
>     return checkargs()
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 472, in checkargs
>     return cmdfunc()
>   File "/home/gward/src/hg-crew/mercurial/dispatch.py", line 466, in <lambda>
>     d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
>   File "/home/gward/src/hg-crew/mercurial/util.py", line 401, in check
>     return func(*args, **kwargs)
>   File "/home/gward/src/hg-crew/hgext/mq.py", line 2053, in refresh
>     ret = q.refresh(repo, pats, msg=message, **opts)
>   File "/home/gward/src/hg-crew/hgext/mq.py", line 1384, in refresh
>     backup='strip')
>   File "/home/gward/src/hg-crew/hgext/mq.py", line 922, in strip
>     repair.strip(self.ui, repo, rev, backup)
>   File "/home/gward/src/hg-crew/mercurial/repair.py", line 130, in strip
>     repo.sopener(file, 'a').truncate(troffset)
>   File "/home/gward/src/hg-crew/mercurial/store.py", line 299, in fncacheopener
>     and path not in fnc):
>   File "/home/gward/src/hg-crew/mercurial/store.py", line 278, in __contains__
>     self._load()
>   File "/home/gward/src/hg-crew/mercurial/store.py", line 249, in _load
>     traceback.print_stack()
>    CallCount    Recursive    Total(ms)   Inline(ms) module:lineno(function)
>       247385            0      2.3013      1.3476
> mercurial.store:24(decodedir)
>      +742155            0      0.6963      0.6963   +<method 'replace'
> of 'str' objects>
>      +247385            0      0.2575      0.2575   +<method
> 'startswith' of 'str' objects>
>            1            0      4.0739      1.2936   mercurial.store:245(_load)
>      +247385            0      2.3013      1.3476
> +mercurial.store:24(decodedir)
>      +247385            0      0.3068      0.3068   +<method 'add' of
> 'set' objects>
>      +247385            0      0.1634      0.1634   +<len>
> [...]
> Time: real 11.830 secs (user 9.040+0.000 sys 2.680+0.000)
> 
>>From that, it's pretty clear that 4.0 sec of the 11.8 sec it takes to
> run qrefresh is in loading my massively redundant fncache file (247385
> lines to list 28520 filenames).

If shortly tried to think about how it is possible to force duplicate
entries in the fncache file with the current code and failed. But I
could swear I saw it in the past. tonfa has made some nice refactorings
in store.py though in the past, which IMHO seem to make it harder to get
duplicates. Do you happen to know a use case that produces duplicate
lines in fncache? (I guess I'm too lazy right now to see it)

I played shortly with stripping a cset that added a new file (hg strip),
and found that said strip does not remove files from the store that
"should" be completely removed. Instead, it truncates them to zero, so
it looks to me like those file entries will never be removed from the
fncache file and even be returned by store.fncachestore.datafiles.

I've discussed this shortly with tonfa on irc and we both came to the
conclusion that 'hg clone --uncompressed' should not be affected by
this. Worst case, it just transfers extra empty files.

But there might be callers of store.fncachestore.datafiles who expect to
get exactly all files that have ever been checked-in in the whole
history of the repo (not wanting to see files of previously stripped
csets of course). And removing those stripped-to-zero file entries from
the fncache file would be prudent too.

So I started thinking about doing something like the following:

diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -309,8 +309,12 @@ class fncachestore(basicstore):
             ef = hybridencode(f)
             try:
                 st = os.stat(pjoin(spath, ef))
-                yield f, ef, st.st_size
-                existing.append(f)
+                if st.st_size != 0:
+                    yield f, ef, st.st_size
+                    existing.append(f)
+                else:
+                    # stripped by mq, remove entry from fncache
+                    rewrite = True
             except OSError:
                 # nonexistent entry
                 rewrite = True

That is, we can check the size of the file as returned by the os.stat
call and treat zero-size files the very same as nonexistent fncache entries.

BTW, you can 'hg clone --pull' a repo to have a repo copy with a
cleaned-up fncache file.



More information about the Mercurial-devel mailing list