hg log and directories

S Muralidhar smuralid at yahoo.com
Thu Sep 6 19:27:50 CDT 2012


Hi folks,
I'd like a bit of guidance on an issuing I'm tracking. 

I'm looking into fixing Issue 1340: (http://bz.selenic.com/show_bug.cgi?id=1340) - which complains about "hg log" being slow for untracked files.

The root case seems to be in function walkchangerevs() in cmdutil.py, which checks to see if the specified file has a revlog. If not, it decides to use a "slow path" (that walks the revision tree backwards to see if there the file existed as of that revision). 

I started with the hypothesis that the non-existence of a revlog could be either because the file didn't exist, or because it could be a directory (either currently existing, or something that existed in a past revision). So my plan was to check for the existence of the directory in the .hg/store/data/ tree. If the directory exists, then I continue to use the slowpath. If however, no such directory exists, then I know this file doesn't exist in any revision, and I skip the slow path. 

With this change, "hg log" on an untracked file is significantly faster. (hg log on a directory is still slow, because it continues down the "slow path", but that's an issue for another day). But Bryan O'Sullivan pointed out some gotchas in my check for "directory existence", and suggested that I ask on this forum. 

Here's my current diff

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -1067,6 +1067,13 @@
                     if follow:
                         raise util.Abort(
                             _('cannot follow nonexistent file: "%s"') % file_)
+                    # Check to see if this was ever a directory. If not, then
+                    # we can assume that no such file/directory with this name
+                    # existed (as evidenced by the zero-length filelog).
+                    # So there's nothing to log. For directories, we still need
+                    # to take the slow path
+                    if not repo.isorwasdir(file_):
+                        continue
                     slowpath = True
                     break
                 else:

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -788,6 +788,17 @@
     def wjoin(self, f):
         return os.path.join(self.root, f)
 
+    def isorwasdir(self, d):
+        """Determines if 'd' is currently (or was in the past) a
+        valid directory in the repo"""
+        # Remove leading slash; add trailing slash
+        if d[0] == '/':
+            d = d[1:]
+        if d[len(d) - 1] != '/':
+            d = d + '/'
+        dx = self.store.encode("/".join(("data", d)))
+        return self.sopener.isdir(dx)
+
     def file(self, f):
         if f[0] == '/':
             f = f[1:]

diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -378,6 +378,9 @@
             self.fncache.add(path)
         return self.opener(self.encode(path), mode, *args, **kw)
 
+    def join(self, f):
+        return self.opener.join(f)
+
 class fncachestore(basicstore):
     def __init__(self, path, openertype, encode):
         self.encode = encode


The localrepository.isorwasdir() method above uses the store's encode function to encode the directory name as appropriate (to handle cases like  abc.i/, abc.hg/, Abc.i/ etc.), and that seems to work for the most part. However, Bryan mentioned that things may not be as simple as they seem
>> The fncache encoding scheme makes this much more difficult. It changes the directory that a file gets written to based on the length of the filename. 
>> This means that two files that share the same leading path components can get written to two different directories under .hg/store, and two files with different leading path components can be written to a single directory.


So, I'd like to ask for suggestions on how to address this; Am I even going down the right path, etc.?

Thanks
Murali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120906/b171dd4c/attachment.html>


More information about the Mercurial-devel mailing list