[Bug 3490] New: scmutil.canonpath failure with run-tests

bugzilla-daemon at bz.selenic.com bugzilla-daemon at bz.selenic.com
Wed Jun 6 12:16:41 CDT 2012


http://bz.selenic.com/show_bug.cgi?id=3490

          Priority: normal
            Bug ID: 3490
                CC: mads at kiilerich.com, mercurial-devel at selenic.com
          Assignee: adrian at cadifra.com
           Summary: scmutil.canonpath failure with run-tests
          Severity: bug
    Classification: Unclassified
                OS: Windows
          Reporter: adrian at cadifra.com
          Hardware: PC
            Status: UNCONFIRMED
           Version: unspecified
         Component: Mercurial
           Product: Mercurial

(as already posted to mercurial-devel)

Mercurial version used: d566aa319d5f.

So I'm seeing some other nastyness with paths while trying to enable more tests
for Windows.

Example problem candidates are: test-rename.t, test-walk.t

The problem manifests like this:

Let test-a.t be a testfile containing:

    $ hg init
    $ mkdir d1 d1/d11 d2
    $ echo bla > d1/d11/a1
    $ hg ci -Amx
    adding d1/d11/a1

  rename a single file using absolute paths

    $ hg rename "`pwd`/d1/d11/a1" "`pwd`/d2/c"

Running test-a.t on Windows with MSYS yields:

  --- C:/Users/adi/hgrepos/hg-main/tests\test-a.t
  +++ C:/Users/adi/hgrepos/hg-main/tests\test-a.t.err
  @@ -7,3 +7,5 @@
   rename a single file using absolute paths

     $ hg rename "`pwd`/d1/d11/a1" "`pwd`/d2/c"
  +  abort: $TESTTMP/d1/d11/a1 not under root
  +  [255]

Which is not expected at all!

Now, let's dig further and insert a --traceback:

  --- C:/Users/adi/hgrepos/hg-main/tests\test-a.t
  +++ C:/Users/adi/hgrepos/hg-main/tests\test-a.t.err
  @@ -7,3 +7,39 @@
   rename a single file using absolute paths

     $ hg --traceback rename "`pwd`/d1/d11/a1" "`pwd`/d2/c"
  +  Traceback (most recent call last):
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\dispatch.py", line 88, in
_runcatch
  +      return _dispatch(req)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\dispatch.py", line 739, in
_dispatch
  +      cmdpats, cmdoptions)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\dispatch.py", line 513, in
runcommand
  +      ret = _runcommand(ui, options, cmd, d)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\dispatch.py", line 829, in
_runcommand
  +      return checkargs()
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\dispatch.py", line 800, in
checkargs
  +      return cmdfunc()
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\dispatch.py", line 736, in
<lambda>
  +      d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\util.py", line 475, in
check
  +      return func(*args, **kwargs)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\commands.py", line 4762, in
rename
  +      return cmdutil.copy(ui, repo, pats, opts, rename=True)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\cmdutil.py", line 443, in
copy
  +      srcs = walkpat(pat)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\cmdutil.py", line 249, in
walkpat
  +      m = scmutil.match(repo[None], [pat], opts, globbed=True)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\scmutil.py", line 627, in
match
  +      return matchandpats(ctx, pats, opts, globbed, default)[0]
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\scmutil.py", line 620, in
matchandpats
  +      default)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\context.py", line 303, in
match
  +      auditor=r.auditor, ctx=self)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\match.py", line 71, in
__init__
  +      pats = _normalize(patterns, default, root, cwd, auditor)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\match.py", line 306, in
_normalize
  +      name = scmutil.canonpath(root, cwd, name, auditor)
  +    File "C:\Users\adi\hgrepos\hg-main\mercurial\scmutil.py", line 352, in
canonpath
  +      raise util.Abort('%s not under root' % myname)
  +  Abort: $TESTTMP/d1/d11/a1 not under root
  +  abort: $TESTTMP/d1/d11/a1 not under root
  +  [255]

So there we have the mother of all path obscurities on the table:
scmutil.canonpath.

So what does this funny function do? Here comes the python source code
(apologies for
the length of this posting, but it's a very hot spot in the mercurial code):

  def canonpath(root, cwd, myname, auditor=None):
      '''return the canonical path of myname, given cwd and root'''
      if util.endswithsep(root):
          rootsep = root
      else:
          rootsep = root + os.sep
      name = myname
      if not os.path.isabs(name):
          name = os.path.join(root, cwd, name)
      name = os.path.normpath(name)
      if auditor is None:
          auditor = pathauditor(root)
      if name != rootsep and name.startswith(rootsep):
          name = name[len(rootsep):]
          auditor(name)
          return util.pconvert(name)
      elif name == root:
          return ''
      else:
          # Determine whether `name' is in the hierarchy at or beneath `root',
          # by iterating name=dirname(name) until that causes no change (can't
          # check name == '/', because that doesn't work on windows).  For each
          # `name', compare dev/inode numbers.  If they match, the list `rel'
          # holds the reversed list of components making up the relative file
          # name we want.
          root_st = os.stat(root)
          rel = []
          while True:
              try:
                  name_st = os.stat(name)
              except OSError:
                  name_st = None
              if name_st and util.samestat(name_st, root_st):
                  if not rel:
                      # name was actually the same as root (maybe a symlink)
                      return ''
                  rel.reverse()
                  name = os.path.join(*rel)
                  auditor(name)
                  return util.pconvert(name)
              dirname, basename = os.path.split(name)
              rel.append(basename)
              if dirname == name:
                  break
              name = dirname

          raise util.Abort('%s not under root' % myname)

Now, with what values is this function called? Here they are (typical example):

  root:   'c:\\users\\adi\\appdata\\local\\temp\\hgtests.zf5gag\\test-a.t'
  cwd:    ''
  myname: 'C:/Users/adi/AppData/Local/Temp/hgtests.zf5gag/test-a.t/d1/d11/a1'

(note that '\\' represents a single backslash character).

Wet-dream-expected return value of canonpath for this input would be:

  d1/d11/a1

which is the relative path of myname reaching into the test repo at root.

Notice the term "wet-dream". Why do I say this? Because of this: If you look
at the part:

      if name != rootsep and name.startswith(rootsep):
          name = name[len(rootsep):]
          auditor(name)
          return util.pconvert(name)

rootsep is root, made sure it has os.sep at the end (os.sep is backslash on
Windows).

So if name starts with rootsep, we are lucky and we can strip the tail of name
from name, pconvert it (which replaces backslash with slash) and return the
much
hoped-for result 'd1/d11/a1'.

But nope. There is not that much luck here. As you clearly can see

  'C:/Users/adi/AppData/Local/Temp/hgtests.zf5gag/test-a.t/d1/d11/a1'

does not at all start with

  'c:\\users\\adi\\appdata\\local\\temp\\hgtests.zf5gag\\test-a.t'

For a couple of resons:

  (a) name uses slash
  (b) rootsep is all lowercase (Yikes! Information loss!)

While the first is not that much of a problem and could be easily worked
around, we have a serious problem with (b).

Now, wait. Isn't there more to the rescue in canonpath? Yep there is, and it
looks damn clever. So let's look into what's more there.

As we have seen

      if name != rootsep and name.startswith(rootsep):

clearly can't trigger in our case (sadly!). So we move on to:

          # Determine whether `name' is in the hierarchy at or beneath `root',
          # by iterating name=dirname(name) until that causes no change (can't
          # check name == '/', because that doesn't work on windows).  For each
          # `name', compare dev/inode numbers.  If they match, the list `rel'
          # holds the reversed list of components making up the relative file
          # name we want.
          root_st = os.stat(root)
          rel = []
          while True:
              try:
                  name_st = os.stat(name)
              except OSError:
                  name_st = None
              if name_st and util.samestat(name_st, root_st):
                  if not rel:
                      # name was actually the same as root (maybe a symlink)
                      return ''
                  rel.reverse()
                  name = os.path.join(*rel)
                  auditor(name)
                  return util.pconvert(name)
              dirname, basename = os.path.split(name)
              rel.append(basename)
              if dirname == name:
                  break
              name = dirname

          raise util.Abort('%s not under root' % myname)

So what is this funny part of the code trying to achieve? It iterates
over name, stripping path segments from its end and then trying to see
if the resulting path relly points to the same directory on disk.

So how is that "point to the same directory" tried to be done? It's in the
line:

              if name_st and util.samestat(name_st, root_st):

Ok. Great. So let's look at how util.samestat is implemented:

For Windows, it's implemented in mercurial/windows.py:

136  def samestat(s1, s2):
137      return False

Aww. And indeed, on Windows no one has ever seen an implementation for
samestat, do we?

So that clever iteration over directories was invented by one of the
Linux cracks here, but the implementation falls flat on the face on
Windows, due to the missing samestat on Windows.

Now, the question is: Why don't we get more complaints by Windows users,
if the implementation of canonpath is that sorely in trouble?

In other words, why did it ever work on Windows?

The reason is the following: If I run the test script snippet test-a.t
manually in a cmd.exe on Windows, it happens to - believe it or not -
works all perfectly fine, if I replace `pwd` in the script with the
real path of the current working dir (e.g. 'C:\Users\adi\hgrepos\tests\a').

Note that in that case, we happen to be in luck and the code part

      if name != rootsep and name.startswith(rootsep):
          name = name[len(rootsep):]
          auditor(name)
          return util.pconvert(name)

succeeds.

So - letting the debate about whether canonpath is stupid or not aside -
why the heck do we have that much bad luck if the test is run by
tests/run-tests.py?

Now, we are getting to the point. Promised.

run-tests.py creates a temporary directory for running the test. It's

  'c:\\users\\adi\\appdata\\local\\temp\\hgtests.zf5gag\\test-a.t'

(the value of root above).

But notice that there is something pretty fishy with this path. That path
completely lacks the true casing of that directory on disk. On disk, it
namely happens to be:

  C:/Users/adi/AppData/Local/Temp/hgtests.zf5gag/test-a.t

So why the heck do we get such a crippled-case path in one part and the
real thing in another?

Now, we have to look at where the path

  'c:\\users\\adi\\appdata\\local\\temp\\hgtests.zf5gag\\test-a.t'

is coming from. It comes from run-tests.py

1249        tmpdir = tempfile.mkdtemp('', 'hgtests.')

which uses Python's mkdtemp() function of the tempfile Python standard
library module.

If I run that on Windows 7, I get an all-lowercase path:

  $ python
  Python 2.6.6 (r266:84297, Aug 24 2010, 18:13:38) [MSC v.1500 64 bit (AMD64)]
on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import tempfile
  >>> tempfile.mkdtemp('', 'hgtests.')
  'c:\\users\\adi\\appdata\\local\\temp\\hgtests.jfpxra'

Even though that temp dir is clearly below

  C:/Users/adi/AppData/Local/Temp

on my disk.

So, tempfile.mkdtemp is returning pretty crippled info. I haven't yet
investigated why. Perhaps it's the implementation of mkdtemp(), or the
Windows APIs used are alrady delivering such crappy paths. It's no surprise
at all on Windows: Everyone hacking on Windows does believe that casing of
paths doesn't matter, right? (as we see, it causes more harm than what it
tries to solve. But we have too many users on Windows, right?).

Now, how does the case crippled path make it into the process running the
test script?

That cripled path is put into the local var HGTMP in run-tests.py and into
the environment var with the same name.

And it is then used to construct the final path for running the test

911    testtmp = os.environ["TESTTMP"] = os.environ["HOME"] = \
912        os.path.join(HGTMP, os.path.basename(test)).replace('\\', '/')

(wee, line 911).

So what line 911 in run-tests.py does is adding the name of the test to the
path, which gives the well known "gag" path:

  'c:\\users\\adi\\appdata\\local\\temp\\hgtests.zf5gag\\test-a.t'

That path is also put into the environment variables TESTTMP and HOME.

It is then used for the cwd paramter of the subprocess.Popen call in
run-tests.py:

63    p = subprocess.Popen(cmd, shell=True, bufsize=-1, cwd=wd,
64                         close_fds=closefds,
65                         stdin=subprocess.PIPE, stdout=subprocess.PIPE,
66                         stderr=subprocess.STDOUT)

So the process running the script gets that exact path as the current
working directory when it's started.

This can also be seen by inserting

   >>> import os; print os.getcwd()

into test-a.t and disabling the string replacements at the end of run().

Running test-a.t again will yield:

   >>> import os; print os.getcwd()
+  c:\\users\\adi\\appdata\\local\\temp\\hgtests.qq0dm4\\test-a.t\r (esc)

So the script process gets its lowercase-crippled cwd.

But where does that cute process get the non-casecrippled paths from?

No, we get to yet another MSYS magic: pwd -W

Example chat with sh.exe of MSYS:

  adi at kork /tmp/hgtests.77lj6e
  $ pwd
  /tmp/hgtests.77lj6e

  adi at kork /tmp/hgtests.77lj6e
  $ pwd -W
  C:/Users/adi/AppData/Local/Temp/hgtests.77lj6e

So what does this have to do with run-tests.py?

run-tests.py does:

617    if os.getenv('MSYSTEM'):
618        script.append('alias pwd="pwd -W"\n')

So, it prepends a clever alias at the cooked skript, which makes sure that
"pwd" inside a script is in fact "pwd -W".

And the "pwd -W" of the sh.exe of MSYS seems indeed to do a very good
job at getting the real casing of the current working directory.

It's just that we now fall flat on our face when these paths meet in
scmutil.canonpath.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list