D588: win32: use fewer system calls for unlink()

Augie Fackler raf at durin42.com
Fri Sep 1 12:43:46 EDT 2017


(replying to this thread to see if the response shows up in Phabricator, Kevin just configured that)

> On Sep 1, 2017, at 02:16, Adrian Buehlmann <adrian at cadifra.com> wrote:
> 
> Ugh. Can I reply to a phabricator notification by email?
> 
> Adding gregory.szorc at gmail.com <mailto:gregory.szorc at gmail.com> manually, as I'm not sure replaying to
> those nasty phabricator emails is going to work...
> 
> On 2017-09-01 00:32, indygreg (Gregory Szorc) wrote:
>> indygreg created this revision.
>> Herald added a subscriber: mercurial-devel.
>> Herald added a reviewer: hg-reviewers.
>> 
>> REVISION SUMMARY
>>  unlink() on Windows is complicated by the fact that Windows doesn't
>>  support removing an open file. Our unlink wrapper currently goes
>>  through a dance where it generates a random filename, renames the
>>  file, then attempts to unlink.
>> 
>>  This is a lot of extra work for what is likely the special case of
>>  attempting to unlink a file that is open or can't be simply unlinked.
>> 
>>  In this commit, we refactor the code to use fewer system calls in
>>  the common cases of 1) unlink just works 2) file doesn't exist.
>> 
>>  For the file doesn't exist case (before)
>> 
>>  1. stat() to determine if path is directory
>>  2. generate random filename
>>  3. rename()
>> 
>>  after:
>> 
>>  1. stat() to get path info
>> 
>>  For the file not open case (before)
>> 
>>  1. stat() to determine if path is directory
>>  2. generate random filename
>>  3. rename()
>>  4. unlink()
>> 
>>  after:
>> 
>>  1. stat()
>>  2. unlink()
>> 
>>  For the file open case (before)
>> 
>>  1. stat() to determine if path is directory
>>  2. generate random filename
>>  3. rename()
>>  4. unlink()
>> 
>>  after:
>> 
>>  1. stat()
>>  2. unlink()
>>  3. generate random filename
>>  4. rename()
>>  5. unlink()
>> 
>>  There is also a scenario where the unlink fails due to the file being
>>  marked as read-only. In this case we also introduce an extra unlink()
>>  call. However, I reckon the common case is the file isn't marked as
>>  read-only and skipping the random generation and rename is worth it.
>> 
>>  I /think/ this change makes bulk file writing on Windows faster. This
>>  is because vfs.__call__ calls unlink() when a file is opened for
>>  writing. When writing a few hundred files files as part of a clone or
>>  working directory update, the extra system calls can matter.
>>  win32.unlink() did show up in performance profiles, which is what
>>  caused me to look at this code. But I/O performance is hard to measure
>>  and I'm not sure if the ~30s off of ~620s for a stream clone unbundle
>>  on the Firefox repo is indicative of real world performance. I do know
>>  the new code uses fewer system calls and shouldn't be slower.
>> 
>> REPOSITORY
>>  rHG Mercurial
>> 
>> REVISION DETAIL
>>  https://phab.mercurial-scm.org/D588
>> 
>> AFFECTED FILES
>>  mercurial/win32.py
>> 
>> CHANGE DETAILS
>> 
>> diff --git a/mercurial/win32.py b/mercurial/win32.py
>> --- a/mercurial/win32.py
>> +++ b/mercurial/win32.py
>> @@ -12,6 +12,7 @@
>> import msvcrt
>> import os
>> import random
>> +import stat
>> import subprocess
>> 
>> from . import (
>> @@ -522,11 +523,26 @@
>> def unlink(f):
>>     '''try to implement POSIX' unlink semantics on Windows'''
>> 
>> -    if os.path.isdir(f):
>> -        # use EPERM because it is POSIX prescribed value, even though
>> -        # unlink(2) on directories returns EISDIR on Linux
>> -        raise IOError(errno.EPERM,
>> -                      "Unlinking directory not permitted: '%s'" % f)
>> +    # If the path doesn't exist, raise that exception.
>> +    # If it is a directory, emulate POSIX behavior.
>> +    try:
>> +        st = os.stat(f)
>> +        if stat.S_ISDIR(st.st_mode):
>> +            # use EPERM because it is POSIX prescribed value, even though
>> +            # unlink(2) on directories returns EISDIR on Linux
>> +            raise IOError(errno.EPERM,
>> +                          "Unlinking directory not permitted: '%s'" % f)
>> +    except OSError as e:
>> +        if e.errno == errno.ENOENT:
>> +            raise
>> +
>> +    # In the common case, a normal unlink will work. Try that first and fall
>> +    # back to more complexity if and only if we need to.
>> +    try:
>> +        os.unlink(f)
>> +        return
>> +    except (IOError, OSError) as e:
>> +        pass
>> 
>>     # POSIX allows to unlink and rename open files. Windows has serious
>>     # problems with doing that:
> 
> Do you get an error at all, if a file, which is in open state, is unlinked?
> 
> My fear is: You won't get an error, but instead the filename is blocked
> by the file being held in place by the other process, until the other
> process closes it. Which means: You already lost the game.
> 
> Which would explain why we didn't do things like you propose here.
> 
> See also
> 
>   https://www.mercurial-scm.org/wiki/UnlinkingFilesOnWindows <https://www.mercurial-scm.org/wiki/UnlinkingFilesOnWindows>
> 
> (specifically, heading 2)
> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org <mailto:Mercurial-devel at mercurial-scm.org>
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel <https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170901/13cc9109/attachment.html>


More information about the Mercurial-devel mailing list