Solving long paths by hashing

Adrian Buehlmann adrian at cadifra.com
Sun Jun 29 18:27:04 CDT 2008


On 29.06.2008 15:30, Adrian Buehlmann wrote:
> On 29.06.2008 15:06, Dirkjan Ochtman wrote:
>> Adrian Buehlmann wrote:
>>> Sounds like you would take Jesse's patch then?
>> I haven't seen it, but the concept sounds interesting to me.
>>
> 
> I repeated Jesse's link to his patch in the first post of this thread. Here is
> the link again:
> 
> http://www.selenic.com/mercurial/bts/file520/prevent-excessively-long-repo-paths.diff
> 
> I'll post Jesse's patch below so you can read and comment it inline:
> 


> 
> diff -r 04c76f296ad6 mercurial/hg.py
> --- a/mercurial/hg.py	Mon Dec 10 10:26:42 2007 -0600
> +++ b/mercurial/hg.py	Thu Dec 13 21:59:29 2007 -0500
> @@ -198,6 +198,7 @@ def clone(ui, source, dest=None, pull=Fa
>              dest_lock = lock.lock(os.path.join(dest_store, "lock"))
> 
>              files = ("data",
> +                     "longnames",
>                       "00manifest.d", "00manifest.i",
>                       "00changelog.d", "00changelog.i")
>              for f in files:
> diff -r 04c76f296ad6 mercurial/localrepo.py
> --- a/mercurial/localrepo.py	Mon Dec 10 10:26:42 2007 -0600
> +++ b/mercurial/localrepo.py	Thu Dec 13 21:59:29 2007 -0500
> @@ -11,10 +11,11 @@ import changelog, dirstate, filelog, man
>  import changelog, dirstate, filelog, manifest, context, weakref
>  import re, lock, transaction, tempfile, stat, errno, ui
>  import os, revlog, time, util, extensions, hook
> +import sha
> 
>  class localrepository(repo.repository):
>      capabilities = util.set(('lookup', 'changegroupsubset'))
> -    supported = ('revlogv1', 'store')
> +    supported = ('revlogv1', 'store', 'longnames')
> 
>      def __init__(self, parentui, path=None, create=0):
>          repo.repository.__init__(self)
> @@ -59,17 +60,7 @@ class localrepository(repo.repository):
>              if r not in self.supported:
>                  raise repo.RepoError(_("requirement '%s' not supported") % r)
> 
> -        # setup store
> -        if "store" in requirements:
> -            self.encodefn = util.encodefilename
> -            self.decodefn = util.decodefilename
> -            self.spath = os.path.join(self.path, "store")
> -        else:
> -            self.encodefn = lambda x: x
> -            self.decodefn = lambda x: x
> -            self.spath = self.path
> -        self.sopener = util.encodedopener(util.opener(self.spath),
> -                                          self.encodefn)
> +        self._setup_store(requirements, util.opener, os.path.join)
> 
>          self.ui = ui.ui(parentui=parentui)
>          try:
> @@ -83,6 +74,73 @@ class localrepository(repo.repository):
>          self.nodetagscache = None
>          self.filterpats = {}
>          self._transref = self._lockref = self._wlockref = None
> +
> +    def _setup_store(self, requirements, opener, pathjoiner):
> +        if "store" in requirements:
> +            self._longnames = None
> +            def load_longnames():
> +                if self._longnames == None:
> +                    self._longnames = {}
> +                    self._longnames_transient = {}
> +                    try:
> +                        self._longnames_file = opener(self.spath)('longnames',
> +                                                                  mode='a+')

So that longnames file is in fact a cache of all the names of files with long names in
the repo. The hashed filename is not written to this file because it can be
calculated (so it's not a mapping).

The longnames cache is incrementally written, whenever a new filelog of a file
with a long filename is added to the repo. The longnames cache is not sorted.
The entries are written in the same order as the filelogs are created.

This longnames cache could in theory be rebuilt by iterating over all manifest
revisions. That's why I call it a cache. But that operation would be very
expensive, so rebuilding the longnames cache of a repo is never done.

It will be used by streamclone to iterate the hashed-names filelogs, because the wire
protocol requires the unencoded filenames to be sent over the wire.

I assume hg verify won't rebuild that list either to check it, as that would be
too expensive to do. So hg verify will probably just iterate over the entries in the
longnames file and see if it's corresponding hashed revlog exists. Missing
entries in the longnames file won't be detected then.

Incomplete longnames files can for example be detected by doing a hg serve
on the repo and clone it over http and then doing a hg verify on the cloned repo.
The missing revlogs will then be reported.

In order to be sure that the longnames file is correct (i.e is complete), we would
then have to do a local clone --pull, which rebuilds the longnames file of the clone
from scratch.

Did I get that right?


More information about the Mercurial-devel mailing list