[PATCH] hg_log: speed up hg log for untracked files (issue1340)

Matt Mackall mpm at selenic.com
Tue Sep 11 13:40:14 CDT 2012

On Tue, 2012-09-11 at 10:36 -0700, S Muralidhar wrote:
> Thanks for reviewing this, Matt. A few questions/comments below
> > This patch should be in multiple pieces:
> > - introduce the basic store method
> > - introduce the fncache method
> > - introduce the user in cmdutil + test
> I assume that step #2 will also include the fncachestore method (along
> with the fncache method) - just making sure I understood the protocol
> here. 


> > I think the approach in fncache should be two-pass:
> > - check for a file matching arg in the cache directly
> > - scan for a directory match by iterating startswith over the cache
> Another point of clarification: should I bundle this all into the
> __contains__ method on fncache?


>  Are there any perf concerns about adding an extra scan of the cache
> (although, it'll only happen on cache misses)? 

It's not extra? It won't be instant, but it'll be much faster than going
down the log slow path.

> >> class basicstore
> >> +    def __contains__(self, path):
> >> +        '''Checks if this path exists in the store'''
> >> +        return self.opener.exists(path)
> > Doesn't work with directories?

> I didn't follow this comment, Matt? 

This appears not to be smart enough to work with directories, am I

> >> diff --git a/tests/test-glog.t b/tests/test-glog.t
> >> +  +nodetag 3
> >> +  nodetag 0
> > ???

> This is an example where hg log shows differences in output between
> the slow path and the fast path (we had a discussion earlier about
> this - http://bz.selenic.com/show_bug.cgi?id=3613), This particular
> testcase is testing differences between hg log and hg log -g, and one
> of them now uses the fast path for an untracked file, while the other
> doesn't (both of them used to go through the slow path earlier)

Huh. I've been focusing on this case, which I happen to hit a lot:

 hg log actuallyarevision   # long pause, empty output, facepalm

So, at a minimum, we'd like to get rid of the long pause.

But this test seems to be about:

 hg log afile notafile  # shows deletions and renames on afile

One important thing to know at this juncture is that the log code you're
hacking has been marked for death. It's just a matter of time before
it's replaced by graphlog. So divergence from existing output is not
desirable. Given the test that's getting broken here is probably not a
case we care about performance-wise, we should probably endeavor to
leave it alone, to avoid unnecessarily diverging from the graphlog code.

So instead, we should be doing this up front:

if there are files specified:
  if they're all explicit filenames (not patterns):
    if they all do not exist according to store:
      return []  # exit quickly

Eventually we should start considering ways of being smarter here, for
instance, actually giving useful output for 'hg log stable'. Git does
something like this:

$ git log skjdfs
fatal: ambiguous argument 'skjdfs': unknown revision or path not in the
working tree.
Use '--' to separate paths from revisions

..but if you ask it for the log of a file that exists in history but not
in the working directory, it'll insist you add '--' (and apparently
sometimes you'll need --follow too) because it can't efficiently search
for all historic files.

But that's all a topic for later.

Mathematics is the supreme nostalgia of our time.

More information about the Mercurial-devel mailing list