[PATCH 1 of 5] platform: create a file closing utility
Gregory Szorc
gregory.szorc at gmail.com
Wed Sep 30 11:01:48 CDT 2015
On Wed, Sep 30, 2015 at 8:46 AM, Augie Fackler <raf at durin42.com> wrote:
> On Tue, Sep 29, 2015 at 10:36:27PM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1443551935 25200
> > # Tue Sep 29 11:38:55 2015 -0700
> > # Node ID f578fedbeb3ce63fff0e3fdbe7b2424e9cdd399a
> > # Parent 46af0adb5c375cc51ce0d29cbdcd8ba843a33425
> > platform: create a file closing utility
>
> several inline comments for this one
>
> [...]
>
> > +class filecloser(object):
> > + """Manages closing of open file handles.
> > +
> > + This is intended to be used when writing to several files in rapid
> > + succession. It is not needed when opening files for reading.
> > +
> > + This doesn't do anything on POSIX because it isn't necessary. See
> the
> > + Windows implementation. It is, however, a good reference for the
> API.
>
> I actually wonder if it might be worth testing something like the
> Windows implementation on OS X where HFS+ is a known source of slow.
>
> > + """
> > + def __init__(self, ui):
> > + pass
> > +
> > + def close(self, fh):
> > + """Schedule a file handle for closing.
> > +
> > + Performs synchronous close on POSIX.
> > + """
> > + fh.close()
> > +
> > + def verifystate(self):
> > + """Ensures the instance is in a good state.
> > +
> > + This should be called before opening files that will be closed
> > + by this class to ensure we don't overwhelm system resources.
> > +
> > + It can also raise if something is wrong.
> > +
> > + It does nothing on POSIX because we close immediately.
> > + """
> > +
> > + def cleanup(self):
> > + """Wait for all operations to complete.
> > +
> > + This must be called for every instance that is created.
> > +
> > + It does nothing on POSIX because there is nothing to clean up.
> > + """
> > diff --git a/mercurial/util.py b/mercurial/util.py
> > --- a/mercurial/util.py
> > +++ b/mercurial/util.py
> > @@ -35,8 +35,9 @@ checklink = platform.checklink
> > copymode = platform.copymode
> > executablepath = platform.executablepath
> > expandglobs = platform.expandglobs
> > explainexit = platform.explainexit
> > +filecloser = platform.filecloser
> > findexe = platform.findexe
> > gethgcmd = platform.gethgcmd
> > getuser = platform.getuser
> > groupmembers = platform.groupmembers
> > diff --git a/mercurial/windows.py b/mercurial/windows.py
> > --- a/mercurial/windows.py
> > +++ b/mercurial/windows.py
> > @@ -7,8 +7,10 @@
> >
> > from i18n import _
> > import osutil, encoding
> > import errno, msvcrt, os, re, stat, sys, _winreg
> > +import time
> > +import threading
> >
> > import win32
> > executablepath = win32.executablepath
> > getuser = win32.getuser
> > @@ -458,4 +460,66 @@ def readpipe(pipe):
> > break
> > chunks.append(s)
> >
> > return ''.join(chunks)
> > +
> > +class filecloser(object):
> > + """Closes file handles efficiently on Windows.
> > +
> > + Closing file handles that have been written to is relatively slow on
> > + Windows compared to other I/O operations. Measurements revealed most
> > + I/O operations (including writes) to be ~1us. Closes, by contrast,
> take
> > + ~1ms - ~1000x slower.
> > +
> > + This class performs file closing on a separate Python thread so file
> > + closing doesn't block the main thread.
> > + """
> > + def __init__(self, ui):
> > + self._queue = []
>
> this should either be a threadsafe queue object, or you should use a
> collections.deque and a lock to preserve threadsafety (using a list as
> a queue can be a bummer performance-wise if it gets large).
>
I thought the GIL removed considerations about thread safety here? My
experience with the thread safe classes and locking in Python is that perf
stinks :/ I can try it though.
>
> > + self._error = None
> > + self._alive = True
> > + self._ev = threading.Event()
> > + self._t = threading.Thread(target=self._processcloses,
> > + name='filecloser')
> > + self._t.start()
> > +
> > + # Windows defaults to max of 512 open file descriptors. A
> buffer of
> > + # 128 should be more than sufficient.
> > + # experimental config: worker.closefileslimit
> > + self._queuelimit = ui.configint('worker', 'closefilelimit', 384)
> > +
> > + def close(self, fh):
> > + self._queue.append(fh)
> > + self._ev.set()
> > +
> > + def verifystate(self):
> > + # Raise an error encountered on background thread.
> > + if self._error:
> > + raise self._error
> > +
> > + # Don't allow the queue to grow too long.
> > + while len(self._queue) >= self._queuelimit:
> > + # This will surrender the GIL. 100ms is a bit long. However,
> > + # if we have lots of queued operations, they likely won't
> finish
> > + # in 100ms since it takes ~1ms to close a file.
> > + time.sleep(0.100)
> > +
> > + def cleanup(self):
> > + self._alive = False
> > + # This arguably isn't necessary. But it should shave a few ms.
> > + self._ev.set()
> > + self._t.join()
> > +
> > + def _processcloses(self):
> > + """Main function of closing thread."""
> > + while self._alive or self._queue:
> > + if not self._queue:
> > + self._ev.wait(0.100)
>
> I think instead of busy-waiting, you could do better with a
> threading.Condition - that would also give you the lock you'd need for
> real thread-safety.
>
OK. Something else to keep in mind is give the results from the stream
clone patch, I may look into using multiple threads.
>
> > + self._ev.clear()
> > +
> > + while self._queue:
> > + fh = self._queue.pop(0)
> > + try:
> > + fh.close()
> > + except Exception as e:
> > + self._error = e
> > + # Don't raise otherwise we'll leak handles.
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150930/befbb91f/attachment.html>
More information about the Mercurial-devel
mailing list