[PATCH 2 of 2 STABLE] store: add auto detection for hardlink count blindness (issue1866)

Matt Mackall mpm at selenic.com
Wed Nov 3 11:52:26 CDT 2010


On Wed, 2010-11-03 at 08:50 +0100, Adrian Buehlmann wrote:
> On 01.11.2010 09:18, Matt Mackall wrote:
> > On Sun, 2010-10-31 at 20:26 +0100, Adrian Buehlmann wrote:
> >>> - Test function is too heavy-weight - do the simplest fastest test
> >>> possible. Probably something like:
> >>>
> >>>  f, fn = getasecurefilehandleandname(path)
> >>>  fn2 = fn + "l"
> >>>  link(fn, fn2)
> >>>  stat
> >>>
> >>> Even this is pretty expensive.
> >>
> >> Near as I can tell, there's no test in my code that is not needed.
> >>
> >> But I guess if even the above is too heavyweight, there's not much point
> >> in explaining in detail every case I covered.
> > 
> > Here's a quick test I made. It catches the Linux bug and should work on
> > Windows shares too.
> > 
> > def checknlink(testfile):
> >     rm = None
> >     try:
> >         for x in xrange(10):
> >             try:
> >                 f = "%s.hgtmp%s" % (testfile, x)
> > 		os.link(testfile, f)
> >                 rm = f
> >             except OSError, inst:
> >                 if inst == errno.EEXISTS:
> >                     continue
> > 		elif inst == errno.EPERM:
> >                     return false
> >                 raise
> >             break
> >         else:
> >             return False
> > 
> > 	try:
> >             fd = open(f, "r", 0)
> >             s = os.stat(f)
> >             if s.st_nlink > 1:
> > 		return True
> >         finally:
> >             fd.close()
> > 
> >     finally:
> > 	if rm:
> >             os.unlink(rm)
> > 
> >     return False
> > 
> > Typically, this will do the bare minimum of syscalls: link, open, stat,
> > close, and unlink. Note how it uses an existing file (which we should
> > always have when doing this test) to avoid the tempfile issues (and the
> > poking around at /dev/urandom that tempfile.mkstemp wants to do).
> > 
> 
> I'm writing this email, trying to record what was discussed and tried on
> IRC yesterday (mpm, pmezard and abuehl).
> 
> Matt pasted the following patch for discussion
> (http://paste.pocoo.org/show/284564/):
> 
> diff -r 333421b9e0f9 mercurial/util.py
> --- a/mercurial/util.py	Mon Nov 01 14:45:27 2010 -0500
> +++ b/mercurial/util.py	Mon Nov 01 16:51:17 2010 -0500
> @@ -716,6 +716,24 @@
>      except (OSError, AttributeError):
>          return False
>  
> +def checknlink(testfile):
> +    '''check whether hardlink count reporting works properly'''
> +    f = testfile + ".hgtmp"
> +    try:
> +        os_link(testfile, f)
> +    except OSError, inst:
> +        # occasional harmless false negatives
> +        return False
> +
> +    try:
> +        fd = open(f)
> +        return nlinks(f) > 1
> +    finally:
> +        fd.close()
> +        os.unlink(f)
> +
> +    return False
> +
>  def endswithsep(path):
>      '''Check path ends with os.sep or os.altsep.'''
>      return path.endswith(os.sep) or os.altsep and path.endswith(os.altsep)
> @@ -840,6 +858,7 @@
>          else:
>              self.auditor = always
>          self.createmode = None
> +        self._trustnlink = None
>  
>      @propertycache
>      def _can_symlink(self):
> @@ -870,7 +889,9 @@
>                      makedirs(dirname, self.createmode)
>              if atomictemp:
>                  return atomictempfile(f, mode, self.createmode)
> -            if nlink > 1:
> +            if nlink == 1 and self._trustnlink is None:
> +                self._trustnlink = checknlink(f)
> +            if self._trustnlink is False or nlink > 1:
>                  rename(mktempcopy(f), f)
>          fp = posixfile(f, mode)
>          if nlink == 0:
> diff -r 333421b9e0f9 mercurial/win32.py
> --- a/mercurial/win32.py	Mon Nov 01 14:45:27 2010 -0500
> +++ b/mercurial/win32.py	Mon Nov 01 16:51:17 2010 -0500
> @@ -43,17 +43,7 @@
>  
>  def nlinks(pathname):
>      """Return number of hardlinks for the given file."""
> -    links = _getfileinfo(pathname)[7]
> -    if links < 2:
> -        # Known to be wrong for most network drives
> -        dirname = os.path.dirname(pathname)
> -        if not dirname:
> -            dirname = '.'
> -        dt = win32file.GetDriveType(dirname + '\\')
> -        if dt == 4 or dt == 1:
> -            # Fake hardlink to force COW for network drives
> -            links = 2
> -    return links
> +    return _getfileinfo(pathname)[7]
>  
>  def samefile(fpath1, fpath2):
>      """Returns whether fpath1 and fpath2 refer to the same file. This is only
> 
> 
> This patch looks pretty nifty, but it has the problem that the opener
> goes into state self._trustnlink == False for filesystems which don't
> support hardlinks (e.g. FAT will raise OSError in checknlink), which
> I think will have the undesired effect that every file is copied on
> every write, which is unneeded and costly (think of Flash drives
> formatted as FAT).

..and then I went on to point out that we shouldn't even be testing
nlinks on 'w'rite (where COW is pointless), but only on 'a'ppend. On
write, we should (and usually do, see localrepo.wwrite) unconditionally
break links by first unlinking any existing file.


-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list