[PATCH 2 of 2] dirstate: normalize on case insensitive filesystems on Mac (issue1663)

Dan Villiom Podlaski Christiansen danchr at gmail.com
Fri Jul 24 16:37:15 CDT 2009

On 24/07/2009, at 22.53, Matt Mackall wrote:

> if fold(filename internally) == fold(filename on disk):
>  files are the same

Ah, that makes things much simpler :) Attached below is another stab,  
this time using the F_GETPATH fcntl, which I just remembered existed.  
It opens the file and asks for its path. This seems much simpler and  
more reliable than trying to re-do the all the logic ourselves.

>> Unfortunately, the issue is slightly more complex than that; the
>> normalisation required for HFS+ doesn't correspond to any standard
>> Unicode normalisation. It might be better to simply implement the
>> normalisation ourselves, based on the HFS volume format  
>> specification.
>> [1] One thing though; not all volumes on Mac OS X are case
>> independent, but I suspect the Unicode normalisation is universal.
>> (I'd have to dig much deeper into documentation, references & source
>> to be certain.)
> I believe you can mount BSD FFS volumes as well, which are not
> UTF16-impaired.

Indeed you can; I just tried with a disk image. It allowed me to  
create both ISO-8859-1 ‘å’ and composed UTF-8 ‘å’. Interestingly, if  
you move them to an HFS+ volume, the former is converted to ‘%E5’, and  
the latter to the familiar decomposed form…

>>> (there are other hairy issues here, like filenames in Latin1)
>> That issue should be ‘solved’ rather simply on Mac OS X, I believe:  
>> by
>> definition, such file names cannot exist, ever. I remember mounting  
>> an
>> NTFS volume once that used some non-UTF-8 encoding for its file  
>> names;
>> whether GUI or CLI, the system *really* doesn't like such file names.
> Yes, but Mercurial must handle more or less arbitrary null-terminated
> byte strings on other systems, so we should give this corner case some
> consideration.

All things considered, isn't it safe to assume that any Mac OS X  
installation uses UTF-8 for file system encoding?


Dan Villiom Podlaski Christiansen
danchr at gmail.com


# HG changeset patch
# User Dan Villiom Podlaski Christiansen <danchr at gmail.com>
# Date 1248470961 -7200
# Node ID f132b7058ffa2d3b4a544fe4ded53ac7e35a26ac
# Parent  d98cef25b5afed5d8aa325ef87f98789367d8b6e
util: add normalizepath() for getting the 'true' path on Mac OS X.

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -59,7 +59,7 @@ class dirstate(object):
      def _foldmap(self):
          f = {}
          for name in self._map:
-            f[os.path.normcase(name)] = name
+            f[util.realpath(name)] = name
          return f

@@ -340,7 +340,7 @@ class dirstate(object):
              self._ui.warn(_("not in dirstate: %s\n") % f)

      def _normalize(self, path, knownpath):
-        norm_path = os.path.normcase(path)
+        norm_path = util.realpath(path)
          fold_path = self._foldmap.get(norm_path, None)
          if fold_path is None:
              if knownpath or not  
os.path.exists(os.path.join(self._root, path)):
diff --git a/mercurial/posix.py b/mercurial/posix.py
--- a/mercurial/posix.py
+++ b/mercurial/posix.py
@@ -7,7 +7,7 @@

  from i18n import _
  import osutil
-import os, sys, errno, stat, getpass, pwd, grp
+import os, sys, errno, stat, getpass, pwd, grp, fcntl

  posixfile = open
  nulldev = '/dev/null'
@@ -104,6 +104,19 @@ def pconvert(path):
  def localpath(path):
      return path

+if sys.platform == 'darwin':
+    def realpath(path):
+        try:
+            # fcntl.h: O_SYMLINK = 0x200000, F_GETPATH = 50
+            f = os.open(path, 0x200000)
+            r = fcntl.fcntl(f, 50, '\0' * 1024)
+            os.close(f)
+            return r.rstrip('\0')
+        except IOError:
+            return path
+    realpath = os.path.realpath
  def shellquote(s):
      if os.sys.platform == 'OpenVMS':
          return '"%s"' % s
diff --git a/mercurial/windows.py b/mercurial/windows.py
--- a/mercurial/windows.py
+++ b/mercurial/windows.py
@@ -126,6 +126,10 @@ def localpath(path):
  def normpath(path):
      return pconvert(os.path.normpath(path))

+def realpath(path):
+    '''Obtain the canonical version of a path.'''
+    return os.path.normpath(os.path.normcase(os.path.realpath(path)))
  def samestat(s1, s2):
      return False

diff --git a/tests/test-path-normalization b/tests/test-path- 
new file mode 100755
--- /dev/null
+++ b/tests/test-path-normalization
@@ -0,0 +1,4 @@
+hg clone --quiet $TESTDIR/test-path-normalization.hg t
+exec hg st -R t
diff --git a/tests/test-path-normalization.hg b/tests/test-path- 
new file mode 100644
GIT binary patch
literal 404
z0s;&$10huyG9fSl8pgpYk?AI$L at -87ni^muCL`4J0s|%l+5^=wri4-IYGX{1noLQi
zHZl at eeuSoK!cnRk*OImct{HOm*$28g5RV}0Q)qZk!TafYGR#&(9PZYW4Opx#oDjRt
zXR7Q0(0w=z^+uQ?2z9ZtxtUbzFjEt-rUu77!kqwjg{AlpOJ<rA)to$4`M at Q3CLPIr
zIZ5a_y=AUg(p at v0MO83}c}Q_o3h4|%DNO~$*6X at F!f%ZY7RZ-yut_j5z+MQPbzuV9

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1943 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20090724/e4c59391/attachment.bin 

More information about the Mercurial-devel mailing list