D588: win32: use fewer system calls for unlink()

Adrian Buehlmann adrian at cadifra.com
Fri Sep 1 06:16:48 UTC 2017


Ugh. Can I reply to a phabricator notification by email?

Adding gregory.szorc at gmail.com manually, as I'm not sure replaying to
those nasty phabricator emails is going to work...

On 2017-09-01 00:32, indygreg (Gregory Szorc) wrote:
> indygreg created this revision.
> Herald added a subscriber: mercurial-devel.
> Herald added a reviewer: hg-reviewers.
> 
> REVISION SUMMARY
>   unlink() on Windows is complicated by the fact that Windows doesn't
>   support removing an open file. Our unlink wrapper currently goes
>   through a dance where it generates a random filename, renames the
>   file, then attempts to unlink.
>   
>   This is a lot of extra work for what is likely the special case of
>   attempting to unlink a file that is open or can't be simply unlinked.
>   
>   In this commit, we refactor the code to use fewer system calls in
>   the common cases of 1) unlink just works 2) file doesn't exist.
>   
>   For the file doesn't exist case (before)
>   
>   1. stat() to determine if path is directory
>   2. generate random filename
>   3. rename()
>   
>   after:
>   
>   1. stat() to get path info
>   
>   For the file not open case (before)
>   
>   1. stat() to determine if path is directory
>   2. generate random filename
>   3. rename()
>   4. unlink()
>   
>   after:
>   
>   1. stat()
>   2. unlink()
>   
>   For the file open case (before)
>   
>   1. stat() to determine if path is directory
>   2. generate random filename
>   3. rename()
>   4. unlink()
>   
>   after:
>   
>   1. stat()
>   2. unlink()
>   3. generate random filename
>   4. rename()
>   5. unlink()
>   
>   There is also a scenario where the unlink fails due to the file being
>   marked as read-only. In this case we also introduce an extra unlink()
>   call. However, I reckon the common case is the file isn't marked as
>   read-only and skipping the random generation and rename is worth it.
>   
>   I /think/ this change makes bulk file writing on Windows faster. This
>   is because vfs.__call__ calls unlink() when a file is opened for
>   writing. When writing a few hundred files files as part of a clone or
>   working directory update, the extra system calls can matter.
>   win32.unlink() did show up in performance profiles, which is what
>   caused me to look at this code. But I/O performance is hard to measure
>   and I'm not sure if the ~30s off of ~620s for a stream clone unbundle
>   on the Firefox repo is indicative of real world performance. I do know
>   the new code uses fewer system calls and shouldn't be slower.
> 
> REPOSITORY
>   rHG Mercurial
> 
> REVISION DETAIL
>   https://phab.mercurial-scm.org/D588
> 
> AFFECTED FILES
>   mercurial/win32.py
> 
> CHANGE DETAILS
> 
> diff --git a/mercurial/win32.py b/mercurial/win32.py
> --- a/mercurial/win32.py
> +++ b/mercurial/win32.py
> @@ -12,6 +12,7 @@
>  import msvcrt
>  import os
>  import random
> +import stat
>  import subprocess
>  
>  from . import (
> @@ -522,11 +523,26 @@
>  def unlink(f):
>      '''try to implement POSIX' unlink semantics on Windows'''
>  
> -    if os.path.isdir(f):
> -        # use EPERM because it is POSIX prescribed value, even though
> -        # unlink(2) on directories returns EISDIR on Linux
> -        raise IOError(errno.EPERM,
> -                      "Unlinking directory not permitted: '%s'" % f)
> +    # If the path doesn't exist, raise that exception.
> +    # If it is a directory, emulate POSIX behavior.
> +    try:
> +        st = os.stat(f)
> +        if stat.S_ISDIR(st.st_mode):
> +            # use EPERM because it is POSIX prescribed value, even though
> +            # unlink(2) on directories returns EISDIR on Linux
> +            raise IOError(errno.EPERM,
> +                          "Unlinking directory not permitted: '%s'" % f)
> +    except OSError as e:
> +        if e.errno == errno.ENOENT:
> +            raise
> +
> +    # In the common case, a normal unlink will work. Try that first and fall
> +    # back to more complexity if and only if we need to.
> +    try:
> +        os.unlink(f)
> +        return
> +    except (IOError, OSError) as e:
> +        pass
>  
>      # POSIX allows to unlink and rename open files. Windows has serious
>      # problems with doing that:

Do you get an error at all, if a file, which is in open state, is unlinked?

My fear is: You won't get an error, but instead the filename is blocked
by the file being held in place by the other process, until the other
process closes it. Which means: You already lost the game.

Which would explain why we didn't do things like you propose here.

See also

   https://www.mercurial-scm.org/wiki/UnlinkingFilesOnWindows

(specifically, heading 2)



More information about the Mercurial-devel mailing list