[PATCH] hg_log: speed up hg log for untracked files (issue1340)

S Muralidhar smuralid at yahoo.com
Tue Sep 11 20:32:20 CDT 2012




> >> diff --git a/tests/test-glog.t b/tests/test-glog.t
> >> +  +nodetag 3
> >> +  nodetag 0
> > ???

> This is an example where hg log shows differences in output between
> the slow path and the fast path (we had a discussion earlier about
> this - http://bz.selenic.com/show_bug.cgi?id=3613), This particular
> testcase is testing differences between hg log and hg log -g, and one
> of them now uses the fast path for an untracked file, while the other
> doesn't (both of them used to go through the slow path earlier)

> Huh. I've been focusing on this case, which I happen to hit a lot:
> hg log actuallyarevision  # long pause, empty output, face palm
> So, at a minimum, we'd like to get rid of the long pause.
> But this test seems to be about:
> hg log afile notafile  # shows deletions and renames on agile


I'm confused now, Matt. So in this test case, technically, just running "hg log afile" should produce exactly the same results as "hg log afile notafile" - but it doesn't (before my change). Here's what I see (afile = "a", notafile = "c")

$ hg log a
changeset:   0:1a31c607f309
summary:     add a


$ hg log a c
changeset:   3:3ff55e1496c5
summary:     mv a b; add d

changeset:   0:1a31c607f309
summary:     add a

$ hg log -G a
o  changeset:   0:1a31c607f309
   summary:     add a

$ hg log -G a c
o  changeset:   3:3ff55e1496c5
|  summary:     mv a b; add d
|
o  changeset:   0:1a31c607f309
   summary:     add a

With my versions of these
$ ~/local/hg/hg log a
changeset:   0:1a31c607f309
summary:     add a

$ ~/local/hg/hg log a c
changeset:   0:1a31c607f309
summary:     add a

$ ~/local/hg/hg log -G a
o  changeset:   0:1a31c607f309
   summary:     add a


As you can see, after my changes, "hg log afile" and "hg log afile notafile" produce the same results. "hg log -G afile" and "hg log -G afile notafile" don't produce the same results - because I hadn't updated the graph log path (_makegraphlogrevset). Once I update that, both of them produce the same results - but in both cases, they will be different from what we produce today. 

I'll send out my diff to _makegraphlogrevset also - putting that in causes test-glog.t to fail again - pretty much for the same test case, but this time, because it expected to go down the slow path, and it no longer does. So, I will still need to update the test case, regardless. Here's what that difference looks like
--- test-glog.t 2012-09-11 11:28:14.000000000 -0700
+++ test-glog.t.err     2012-09-11 18:25:45.000000000 -0700
@@ -1597,15 +1597,14 @@
   $ testlog a c
   []
   (group
+    (group
+      (or
     (func
-      ('symbol', '_matchfiles')
-      (list
-        (list
-          (list
-            ('string', 'r:')
-            ('string', 'd:relpath'))
-          ('string', 'p:a'))
-        ('string', 'p:c'))))
+          ('symbol', 'filelog')
+          ('string', 'a'))
+        (func
+          ('symbol', 'filelog')
+          ('string', 'c')))))
 
 Test multiple --include/--exclude/paths

> One important thing to know at this juncture is that the log code you're
> hacking has been marked for death. It's just a matter of time before
> it's replaced by graphlog. So divergence from existing output is not
> desirable. Given the test that's getting broken here is probably not a
> case we care about performance-wise, we should probably endeavor to
> leave it alone, to avoid unnecessarily diverging from the graphlog code


So, I'm not following this part - in either case, it looks like I will need to update the test. It definitely seems like updating the graph log path also is a better approach, and I'm happy to do that.

> So instead, we should be doing this up front:
> if there are files specified:
>  if they're all explicit filenames (not patterns):
>    if they all do not exist according to store:
>      return []  # exit quickly


That makes perfect sense, and I'm happy to go ahead and do that - as a separate diff of course, but that will change the outputs in all those cases. 
So "hg log a c" and "hg log a" will both return an empty result, as will the -G versions. That seems like the right behavior to me, and as you point out, we should have an option to search for a file in history. Does that sound reasonable?

Thanks
Murali


________________________________
 From: Matt Mackall <mpm at selenic.com>
To: S Muralidhar <smuralid at yahoo.com> 
Cc: "mercurial-devel at selenic.com" <mercurial-devel at selenic.com> 
Sent: Tuesday, September 11, 2012 11:40 AM
Subject: Re: [PATCH] hg_log: speed up hg log for untracked files (issue1340)
 
On Tue, 2012-09-11 at 10:36 -0700, S Muralidhar wrote:
> Thanks for reviewing this, Matt. A few questions/comments below
> 
> 
> > This patch should be in multiple pieces:
> > - introduce the basic store method
> > - introduce the fncache method
> > - introduce the user in cmdutil + test
> 
> I assume that step #2 will also include the fncachestore method (along
> with the fncache method) - just making sure I understood the protocol
> here. 

Sure.

> > I think the approach in fncache should be two-pass:
> > - check for a file matching arg in the cache directly
> > - scan for a directory match by iterating startswith over the cache
> Another point of clarification: should I bundle this all into the
> __contains__ method on fncache?

Yes.

>  Are there any perf concerns about adding an extra scan of the cache
> (although, it'll only happen on cache misses)? 

It's not extra? It won't be instant, but it'll be much faster than going
down the log slow path.

> >> class basicstore
> >> +    def __contains__(self, path):
> >> +        '''Checks if this path exists in the store'''
> >> +        return self.opener.exists(path)
> > Doesn't work with directories?

> I didn't follow this comment, Matt? 

This appears not to be smart enough to work with directories, am I
wrong?


> 
> 
> 
> >> diff --git a/tests/test-glog.t b/tests/test-glog.t
> >> +  +nodetag 3
> >> +  nodetag 0
> > ???

> This is an example where hg log shows differences in output between
> the slow path and the fast path (we had a discussion earlier about
> this - http://bz.selenic.com/show_bug.cgi?id=3613), This particular
> testcase is testing differences between hg log and hg log -g, and one
> of them now uses the fast path for an untracked file, while the other
> doesn't (both of them used to go through the slow path earlier)

Huh. I've been focusing on this case, which I happen to hit a lot:

hg log actuallyarevision   # long pause, empty output, facepalm

So, at a minimum, we'd like to get rid of the long pause.

But this test seems to be about:

hg log afile notafile  # shows deletions and renames on afile

One important thing to know at this juncture is that the log code you're
hacking has been marked for death. It's just a matter of time before
it's replaced by graphlog. So divergence from existing output is not
desirable. Given the test that's getting broken here is probably not a
case we care about performance-wise, we should probably endeavor to
leave it alone, to avoid unnecessarily diverging from the graphlog code.

So instead, we should be doing this up front:

if there are files specified:
  if they're all explicit filenames (not patterns):
    if they all do not exist according to store:
      return []  # exit quickly

Eventually we should start considering ways of being smarter here, for
instance, actually giving useful output for 'hg log stable'. Git does
something like this:

$ git log skjdfs
fatal: ambiguous argument 'skjdfs': unknown revision or path not in the
working tree.
Use '--' to separate paths from revisions

..but if you ask it for the log of a file that exists in history but not
in the working directory, it'll insist you add '--' (and apparently
sometimes you'll need --follow too) because it can't efficiently search
for all historic files.

But that's all a topic for later.

-- 
Mathematics is the supreme nostalgia of our time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120911/112c0f12/attachment.html>


More information about the Mercurial-devel mailing list