[PATCH] Allow manipulating files with long names on Windows

Mads Kiilerich mads at kiilerich.com
Thu Jan 20 14:17:53 CST 2011


On 01/20/2011 06:02 PM, Aaron Cohen wrote:
> # HG changeset patch
> # User Aaron Cohen<aaron at assonance.org>
> # Date 1295249042 18000
> # Node ID 6e72a5a75afc05927a9ee083d6c89450e1b5cc1f
> # Parent  9f707b297b0f52278acc6c4a4f7c6d801001acb7
> Allow manipulating files with long names on Windows

A minor detail from 
http://mercurial.selenic.com/wiki/ContributingChanges : "lowercase 
summary line"

And start the summary line with the extension name.

> Windows by default has a MAX_PATH of 260 characters. A while ago the
> "fncache" format was added which allows repositories on Windows to
> contain very long paths. At the time, a patch was proposed,
> "longpath.patch" which enabled handling of those files in the working
> copy but it was tabled.
...
>    This extension transparently uses so-called Universal Naming Convention
> (UNC) paths which allow 32768 character filenames in Windows.

Fine, but ...

 >  From http://mercurial.selenic.com/bts/issue839, I infer that the
 > reason for this is that many tools on Windows don't handle long file
 > names gracefully. Time has passed though and more programs now work,
 > including all Java programs.
 >
...
> rev2:
>    Addressed some code review
...

Such history might be more appropriate in an introduction email than in 
the changeset description.

> diff -r 9f707b297b0f -r 6e72a5a75afc hgext/win32lfn.py
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/hgext/win32lfn.py	Mon Jan 17 02:24:02 2011 -0500
> @@ -0,0 +1,273 @@
> +'''Allow manipulating long file names
> +
> +''Author: Aaron Cohen<aaron at assonance.org>''

This should be in a standard copyright / license comment like in (most) 
other extensions.

> +=== Overview ===
> +
> +Allows creating working copy files whose path is longer than 260 characters on
> + Windows (up to ~32768 characters).
> +
> +Caveats:
> +
> + - Some filesystems may have their own pathname restrictions, such as some FAT
> +    filesystems. Use NTFS or a newer FAT.
> +
> + - cmd.exe has trouble manipulating long pathnames (del, move, rename will all
> +    fail). Use powershell.
> +
> + - Many legacy Windows programs will have difficulty opening files with long
> +    pathnames, though most java and 64-bit programs will work fine.
> +
> + - explorer.exe may have trouble manipulating directories with long paths,
> +    with dialogs like, "The source file name(s) are larger than is supported
> +    by the file system. Try moving to a location which has a shorter path name."
> +    To address this, use a tool other than explorer.exe or delete the affected
> +    files using :hg:`lfn clean`.
> +
> + - Things get more complicated if the root of your repository is more than 244
> +    characters long, including directory separators.
> +
> +    - Firstly, there is no way in Windows to "cd" into a directory that
> +    long. As a result, to use hg with the repo, you will have to use
> +    :hg:`-R` or :hg:`--repository`.
> +
> +     - When Mercurial first starts up, it will not be able to find the
> +     ".hg" directory in such a repository until this extension is loaded.
> +     This implies that this extension must be configured in either the
> +     system-wide or user hgrc or mercurial.ini, not the per-repository
> +     ".hg/hgrc".
> +
> +=== Configuration ===
> +
> +Enable the extension in the configuration file (mercurial.ini)::
> +
> +    [extensions]
> +    win32lfn=

"lfn" is new abbreviation to learn. Would it make sense to call it "unc" 
instead?

> +'''
> +
> +import __builtin__, os, errno
> +
> +_errmap = None
> +
> +from mercurial import util, osutil, error
> +from mercurial.i18n import _
> +
> +_win32 = False
> +try:
> +    import win32api, win32file, winerror, pywintypes
> +
> +    _win32 = True

Beware of demandload that might delay the import error to winerror is 
used in the next lines.

> +    _errmap = {
> +        winerror.ERROR_ALREADY_EXISTS: errno.EEXIST,
> +        winerror.ERROR_PATH_NOT_FOUND: errno.ENOENT
> +    }
> +except ImportError:
> +    pass
> +
> +_uncprefix = "\\\\?\\"
> +
> +_suppressunc = 0
> +
> +# UNC filenames require different normalization than mercurial and python want
> +def unc(path):
> +    global _suppressunc
> +    if not _suppressunc:
> +        _suppressunc += 1

The trick here is that abspath is patched and might recurse back here? 
That might deserve a comment.

Would it be possible to store the original abspath and call that instead?

> +        if not path.startswith(_uncprefix) and not path.startswith("\\\\.\\"):
> +            path = os.path.abspath(path)
> +            # path may now be UNC after abspath
> +            if not path.startswith(_uncprefix):
> +                if path.startswith("\\\\"):
> +                    path = _uncprefix + "UNC\\" + path[2:]
> +                else:
> +                    path = _uncprefix + path
> +        _suppressunc -= 1
> +    return path
> +
> +def wrap(method):

"wrap1" would be more descriptive, especially compared to wrap2.

> +    def fn(*args, **kwargs):
> +        path = unc(args[0])
> +        return method(path, *args[1:], **kwargs)
> +
> +    return fn
> +
> +def wrap2(method):
> +    def fn(*args, **kwargs):
> +        src = unc(args[0])
> +        dst = unc(args[1])
> +        return method(src, dst, *args[2:], **kwargs)
> +
> +    return fn
> +
> +# vanilla os.listdir handles UNC ok, but breaks if they're longer than MAX_PATH

Use docstring instead of comment.

> +def lfnlistdir(path):
> +    path = unc(path)
> +    if not os.path.exists(path) or not os.path.isdir(path):
> +        return []
> +    files = win32file.FindFilesW(os.path.join(path, "*.*"))
> +    result = []
> +    for f in files:
> +        file = f[8]
> +        if not file == u".." and not file == u".":
> +            result.append(file)
> +    return result
> +
> +# vanilla handles UNC pathes but not if longer than MAX_PATH
> +def lfnmkdir(path, mode=None):
> +    path = unc(path)
> +    try:
> +        # second parameter is a security descriptor, mapping it up to our
> +        # "mode" parameter is non-trivial and hopefully unnecessary
> +        win32file.CreateDirectoryW(path, None)
> +    except pywintypes.error, err:
> +        if err.winerror in _errmap:
> +            pyerrno = _errmap[err.winerror]
> +            raise OSError(pyerrno, err.strerror)
> +        raise
> +
> +# vanilla returns a relative path for filenames longer than MAX_PATH
> +# os.path.abspath(30 * "123456789\\") ->  30 * "123456789\\"

You could perhaps phrase this as a doctest in the docstring.

> +def wrapabspath(abspath):
> +    def lfnabspath(path):
> +        result = path
> +        if not os.path.isabs(result):
> +            result = os.path.join(os.getcwd(), result)
> +        result = os.path.normpath(result)
> +        return result
> +
> +    return lfnabspath
> +
> +def _addmissingbackslash(path):
> +    if path.endswith(":"):
> +        path += "\\"
> +    return path
> +
> +# vanilla loses a trailing backslash:

Why is that a problem?

> +# os.path.split('\\\\?\\C:\\') ->  ('\\\\?\\C:', '')
> +def wrapsplit(split):
> +    def lfnsplit(path):
> +        result = split(path)
> +        result = (_addmissingbackslash(result[0]), result[1])
> +        return result
> +
> +    return lfnsplit
> +
> +# vanilla loses a trailing backslash:
> +# os.path.dirname('\\\\?\\C:\\') ->  '\\\\?\\C:'
> +def wrapdirname(dirname):
> +    def lfndirname(path):
> +        result = dirname(path)
> +        return _addmissingbackslash(result)
> +
> +    return lfndirname
> +
> +# Windows API has no SetCurrentDirectory for long paths,
> +# so we implement it internally
> +# http://social.msdn.microsoft.com/Forums/en/windowsgeneraldevelopmentissues/thread/7998d7ec-cf5a-4b5e-a554-13fa855e4a3d
> +def wrapchdir(chdir):
> +    def lfnchdir(path):
> +        if len(os.path.abspath(path))>= 248:

A magic number? Put it in a "constant" with a descriptive name or add a 
comment.

> +            path = unc(path)
> +        if os.path.exists(path):
> +            # Use an environment variable so subprocesses get the correct cwd
> +            os.environ["CD"] = path

Is this CD variable magic or used in other places? If it is for this use 
only we might want to make sure it doesn't collide with other uses - for 
example HGWIN32LFNCWD.

But it seems like a bad idea that chdir doesn't do a chdir. That will 
most likely have unexpected consequences. Wouldn't it be better to just 
fail if cwd is too long?

> +        else:
> +            raise OSError(errno.ENOENT, _("Directory doesn't exist: %s") % path)
> +
> +    return lfnchdir
> +
> +def wrapgetcwd(getcwd):
> +    def lfngetcwd():
> +        if "CD" in os.environ:
> +            result = os.environ["CD"]
> +        else:
> +            result = getcwd()
> +            # Should I un-UNC long directories here?
> +        return result
> +
> +    return lfngetcwd
> +
> +def uisetup(ui):
> +    if not _win32:
> +        ui.warn(_("This extension requires the pywin32 extensions\n"))

It will be hard for the user who get this message to figure which 
extensions "this" refers to.

> +        return
> +    os.listdir = lfnlistdir
> +    os.mkdir = lfnmkdir
> +    os.path.abspath = wrapabspath(os.path.abspath)
> +    os.path.split = wrapsplit(os.path.split)
> +    os.path.dirname = wrapdirname(os.path.dirname)
> +
> +    # No wrapping needed for os.makedirs
> +
> +    os.chdir = wrapchdir(os.chdir)
> +    os.getcwd = wrapgetcwd(os.getcwd)
> +
> +    os.stat = wrap(os.stat)
> +    os.lstat = wrap(os.lstat)
> +    os.open = wrap(os.open)
> +    os.chmod = wrap(os.chmod)
> +    os.remove = wrap(os.remove)
> +    os.unlink = wrap(os.unlink)
> +    os.rmdir = wrap(os.rmdir)
> +    os.removedirs = wrap(os.removedirs)
> +    os.rename = wrap2(os.rename)
> +    os.renames = wrap2(os.renames)
> +    __builtin__.open = wrap(__builtin__.open)
> +
> +    osutil.listdir = wrap(osutil.listdir)
> +    osutil.posixfile = wrap(osutil.posixfile)
> +
> +    util.posixfile = wrap(util.posixfile)
> +    util.makedirs = wrap(util.makedirs)
> +    util.rename = wrap2(util.rename)
> +    util.copyfile = wrap2(util.copyfile)
> +    util.copyfiles = wrap2(util.copyfiles)
> +    if hasattr(util, "unlinkpath"):
> +        util.unlinkpath = wrap(util.unlinkpath)
> +    if hasattr(util, "unlink"):
> +        util.unlink = wrap(util.unlink)

This invasive monkey patching looks a bit scary and fragile and might 
make it less easy for it to become an "official" extension.

> +def list(ui, repo):
> +    for root, _ignored, files in os.walk(repo.root):
> +        for file in files:
> +            if len(root + file)>= 259:

MAXPATH - 1?

"contrib/check-code.py hgext/win32lfn.py" will complain here.

> +                ui.write(os.path.join(root, file) + "\n")
> +
> +def clean(ui, repo, force=False):
> +    for root, _ignored, files in os.walk(repo.root):
> +        for file in files:
> +            if len(root + file)>= 259:
> +                path = os.path.join(root, file)
> +                c = ui.promptchoice(_("Delete %s? [N/y]") % path,
> +                                    (_("&No"), _("&Yes")), 0)

promptchoice is usually used with prompts such as "lowercase...(yn)".

> +                if c or force:
> +                    if hasattr(util, "unlink"):
> +                        util.unlink(path)
> +                    else:
> +                        util.unlinkpath(path)
> +
> +_commands = {
> +    'list': list,
> +    'clean': clean
> +}
> +
> +def lfn(ui, repo, command):
> +    '''Search for or delete files in the working copy that are longer than \
> +MAX_PATH (260) characters.
> +
> + :hg lfn list: List all files in the repository longer than MAX_PATH
> +
> + :hg lfn clean: Prompt to delete all files in the repository longer than
> +    MAX_PATH. This may make it easier to deal with such files, since many
> +    Windows programs are unable to.'''
> +    if command in _commands:
> +        _commands[command](ui, repo)
> +    else:
> +        raise error.SignatureError

Mercurial don't use commands with multiple keywords. It is one keyword 
and a number of options.

I suggest you use two completely different commands. Especially because 
the commands do something completely different.

> +
> +cmdtable = {
> +    "lfn": (lfn,
> +            [],
> +            _('list | clean')),
> +}


Finally:

Many fine extensions have their own life and isn't distributed with 
Mercurial. That has the advantage that they can support multiple 
Mercurial versions (if they can) and they can use their own release and 
bugfix schedule.

Extensions might be accepted in Mercurial if they have proven that they 
are stable and widely used and actively maintained.

I suggest you publish this extension somewhere (for example on 
bitbucket) and add it to 
http://mercurial.selenic.com/wiki/UsingExtensions . Time will tell if it 
would be better to distribute it with Mercurial.


I don't know how much you have looked at the fixut8 and win32mbcs 
extensions. They solve similar problems in a similar way but do it very 
differently. Who should learn from who? Are they compatible? Could they 
share some infrastructure?

/Mads


More information about the Mercurial-devel mailing list