D588: win32: use fewer system calls for unlink()
Adrian Buehlmann
adrian at cadifra.com
Fri Sep 1 06:16:48 UTC 2017
Ugh. Can I reply to a phabricator notification by email?
Adding gregory.szorc at gmail.com manually, as I'm not sure replaying to
those nasty phabricator emails is going to work...
On 2017-09-01 00:32, indygreg (Gregory Szorc) wrote:
> indygreg created this revision.
> Herald added a subscriber: mercurial-devel.
> Herald added a reviewer: hg-reviewers.
>
> REVISION SUMMARY
> unlink() on Windows is complicated by the fact that Windows doesn't
> support removing an open file. Our unlink wrapper currently goes
> through a dance where it generates a random filename, renames the
> file, then attempts to unlink.
>
> This is a lot of extra work for what is likely the special case of
> attempting to unlink a file that is open or can't be simply unlinked.
>
> In this commit, we refactor the code to use fewer system calls in
> the common cases of 1) unlink just works 2) file doesn't exist.
>
> For the file doesn't exist case (before)
>
> 1. stat() to determine if path is directory
> 2. generate random filename
> 3. rename()
>
> after:
>
> 1. stat() to get path info
>
> For the file not open case (before)
>
> 1. stat() to determine if path is directory
> 2. generate random filename
> 3. rename()
> 4. unlink()
>
> after:
>
> 1. stat()
> 2. unlink()
>
> For the file open case (before)
>
> 1. stat() to determine if path is directory
> 2. generate random filename
> 3. rename()
> 4. unlink()
>
> after:
>
> 1. stat()
> 2. unlink()
> 3. generate random filename
> 4. rename()
> 5. unlink()
>
> There is also a scenario where the unlink fails due to the file being
> marked as read-only. In this case we also introduce an extra unlink()
> call. However, I reckon the common case is the file isn't marked as
> read-only and skipping the random generation and rename is worth it.
>
> I /think/ this change makes bulk file writing on Windows faster. This
> is because vfs.__call__ calls unlink() when a file is opened for
> writing. When writing a few hundred files files as part of a clone or
> working directory update, the extra system calls can matter.
> win32.unlink() did show up in performance profiles, which is what
> caused me to look at this code. But I/O performance is hard to measure
> and I'm not sure if the ~30s off of ~620s for a stream clone unbundle
> on the Firefox repo is indicative of real world performance. I do know
> the new code uses fewer system calls and shouldn't be slower.
>
> REPOSITORY
> rHG Mercurial
>
> REVISION DETAIL
> https://phab.mercurial-scm.org/D588
>
> AFFECTED FILES
> mercurial/win32.py
>
> CHANGE DETAILS
>
> diff --git a/mercurial/win32.py b/mercurial/win32.py
> --- a/mercurial/win32.py
> +++ b/mercurial/win32.py
> @@ -12,6 +12,7 @@
> import msvcrt
> import os
> import random
> +import stat
> import subprocess
>
> from . import (
> @@ -522,11 +523,26 @@
> def unlink(f):
> '''try to implement POSIX' unlink semantics on Windows'''
>
> - if os.path.isdir(f):
> - # use EPERM because it is POSIX prescribed value, even though
> - # unlink(2) on directories returns EISDIR on Linux
> - raise IOError(errno.EPERM,
> - "Unlinking directory not permitted: '%s'" % f)
> + # If the path doesn't exist, raise that exception.
> + # If it is a directory, emulate POSIX behavior.
> + try:
> + st = os.stat(f)
> + if stat.S_ISDIR(st.st_mode):
> + # use EPERM because it is POSIX prescribed value, even though
> + # unlink(2) on directories returns EISDIR on Linux
> + raise IOError(errno.EPERM,
> + "Unlinking directory not permitted: '%s'" % f)
> + except OSError as e:
> + if e.errno == errno.ENOENT:
> + raise
> +
> + # In the common case, a normal unlink will work. Try that first and fall
> + # back to more complexity if and only if we need to.
> + try:
> + os.unlink(f)
> + return
> + except (IOError, OSError) as e:
> + pass
>
> # POSIX allows to unlink and rename open files. Windows has serious
> # problems with doing that:
Do you get an error at all, if a file, which is in open state, is unlinked?
My fear is: You won't get an error, but instead the filename is blocked
by the file being held in place by the other process, until the other
process closes it. Which means: You already lost the game.
Which would explain why we didn't do things like you propose here.
See also
https://www.mercurial-scm.org/wiki/UnlinkingFilesOnWindows
(specifically, heading 2)
More information about the Mercurial-devel
mailing list