[PATCH] hg_log: speed up hg log for untracked files (issue1340)

Matt Mackall mpm at selenic.com
Tue Sep 11 11:27:35 CDT 2012


On Mon, 2012-09-10 at 21:49 -0700, S Muralidhar wrote:
> # HG changeset patch
> # User smuralid
> # Date 1347338107 25200
> # Node ID 28f72ea4d93a27650759874ed8c5928f69a3b37c
> # Parent  fc14953e8e34667181cd0d492825342b5e1e880b
> hg_log: speed up hg log for untracked files (issue1340)

"log" is sufficient here.

> 'hg log' on untracked files tends to be fairly slow. The root cause is that we end up using the 'slowpath' when we can't find a revlog for the files listed. This could happen if the file in question is an untracked file, or it is a directory.
> This diff tries to speed up 'hg log' (by avoiding the slowpath) for files if we can determine if that file is not (and never was) a directory. Since we don't have a separate revlog for directories, I'm taking advantage of the fact that the various "versions" of a directory are preserved (under .hg/store/data).
> 'hg log' will continue to be slow for directories - that's for the next diff.
> 
> This iteration of the diff pushes the core path-existence logic into the store layer (specifically into fncachestore and basicstore) based on Matt's feedback.
> basicstore simply tests if the file exists on the OS. fncachestore looks at its cache first to see if a path exists- if not, it calls its superclass (basicstore) to test if a directory by that name exists.
> An alternate approach (in fncachestore) would have been to simply rely on the cache.__contains__ method even for directories. Fncache currently doesn't track directories at all, and it seemed inappropriate to extend this. It was more involved, for one. A larger cache has its downsides and there are some (slight) gotchas with forcing a rebuild of the fncache so it can include directories (for existing repos).

This patch should be in multiple pieces:

- introduce the basic store method
- introduce the fncache method
- introduce the user in cmdutil + test

I think the approach in fncache should be two-pass:

- check for a file matching arg in the cache directly
- scan for a directory match by iterating startswith over the cache

This gives us:

- hg log existingfile -> very fast
- hg log nosuchfileordir -> fast
- hg log existingdir -> slowpath

This is in keeping with Mercurial's usual strategy for tracking
directories: they're implicit in the tracked filenames.


> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -788,6 +788,17 @@
>      def wjoin(self, f):
>          return os.path.join(self.root, f)
>  
> +    def containsdir(self, d):
> +        """Determines if 'd' is currently (or was in the past) a
> +        valid directory in the repo"""
> +        if d.startswith('/'):
> +            d = d[1:]

There should never been any absolute paths?

> +        if d == ".":
> +            return True
> +        if not d.endswith('/'):
> +            d = d + '/'
> +        return d in self.store

No more utility functions in localrepo, please. Either do all of this in
store, or in cmdutil.

> diff --git a/mercurial/store.py b/mercurial/store.py
> --- a/mercurial/store.py
> +++ b/mercurial/store.py
> @@ -286,6 +286,10 @@
>      def write(self):
>          pass
>  
> +    def __contains__(self, path):
> +        '''Checks if this path exists in the store'''
> +        return self.opener.exists(path)

Doesn't work with directories?

>  class encodedstore(basicstore):
>      def __init__(self, path, openertype):
>          self.path = path + '/store'
> @@ -378,6 +382,9 @@
>              self.fncache.add(path)
>          return self.opener(self.encode(path), mode, *args, **kw)
>  
> +    def join(self, f):
> +        return self.opener.join(f)

I don't think we need this. It can be inlined in the one place it's
used.

>  class fncachestore(basicstore):
>      def __init__(self, path, openertype, encode):
>          self.encode = encode
> @@ -422,6 +429,17 @@
>      def write(self):
>          self.fncache.write()
>  
> +    def __contains__(self, path):
> +        '''Checks if this path exists in the store'''
> +        path = "/".join(("data", path))
> +        if path in self.fncache:
> +            return True
> +        else:
> +            # Didn't find 'path' in cache.
> +            # fncache does not keep info on directories - and this may be be
> +            # a directory, so ask the underlying store about it.
> +            return super(fncachestore, self).__contains__(self.join(path))
> +
>  def store(requirements, path, openertype):
>      if 'store' in requirements:
>          if 'fncache' in requirements:
> diff --git a/tests/test-glog.t b/tests/test-glog.t
> --- a/tests/test-glog.t
> +++ b/tests/test-glog.t
> @@ -1606,6 +1606,11 @@
>              ('string', 'd:relpath'))
>            ('string', 'p:a'))
>          ('string', 'p:c'))))
> +  --- log.nodes	* (glob)
> +  +++ glog.nodes	* (glob)
> +  @@ -1 +1,2 @@
> +  +nodetag 3
> +   nodetag 0

???

>  Test multiple --include/--exclude/paths
>  
> diff --git a/tests/test-log.t b/tests/test-log.t
> --- a/tests/test-log.t
> +++ b/tests/test-log.t
> @@ -1213,3 +1213,52 @@
>    1
>  
>    $ cd ..
> +
> +test hg log on non-existent files and on directories
> +  $ hg init issue1340
> +  $ cd issue1340
> +  $ mkdir d1; mkdir D2; mkdir D3.i; mkdir d4.hg; mkdir d5.d; mkdir .d6
> +  $ echo 1 > d1/f1
> +  $ echo 1 > D2/f1
> +  $ echo 1 > D3.i/f1
> +  $ echo 1 > d4.hg/f1
> +  $ echo 1 > d5.d/f1
> +  $ echo 1 > .d6/f1
> +  $ hg add .
> +  adding .d6/f1
> +  adding D2/f1
> +  adding D3.i/f1
> +  adding d1/f1
> +  adding d4.hg/f1
> +  adding d5.d/f1
> +  $ hg commit -m "a bunch of weird directories"
> +  $ hg log -l1 d1/f1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 f1
> +  $ hg log -l1 . | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 ./ | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 d1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 D2 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 D2/f1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 D3.i | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 D3.i/f1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 d4.hg | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 d4.hg/f1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 d5.d | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 d5.d/f1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 .d6 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ hg log -l1 .d6/f1 | grep changeset
> +  changeset:   0:65624cd9070a
> +  $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel


-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list