[PATCH 5 of 5] exchange: use file closer when applying stream clones

Gregory Szorc gregory.szorc at gmail.com
Wed Sep 30 12:07:49 CDT 2015


On Wed, Sep 30, 2015 at 9:52 AM, Adrian Buehlmann <adrian at cadifra.com>
wrote:

> On 2015-09-30 07:36, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1443553088 25200
> > #      Tue Sep 29 11:58:08 2015 -0700
> > # Node ID cea1c79d6ab088caedb93b6576f9293f307355cc
> > # Parent  a8d8f0d70593aaa6fb2b86dc77d29c021d83277d
> > exchange: use file closer when applying stream clones
> >
> > Stream clones essentially stream raw files to disk. They have the
> > potential to write thousands of files. Have them use the file closer
> > so they close files as efficiently as possible.
> >
> > On my Windows machine, this patch has the following impacting on stream
> > clone performance for the mozilla-central repository (tested with the
> > "bundleclone" extension so there is no variance due to networks):
> >
> > Before: ~435s (3.56 MB/s)
> > After:  ~339s (4.55 MB/s)
> >
> > For comparison, I created a .tar (no compression) archive of the .hg
> > directory so I could compare performance against other tools (stream
> > bundles are essentially tar archives - file contents are copied with
> > no stream modification). The popular 7-zip archiving program took 414s
> > to unpack this archive. 7-zip is a compiled/native program.
> > Considering the marginal time difference between 7-zip and Mercurial
> > before this patch, I reckon 7-zip is using a single thread for I/O.
> > I won't report the time of the `tar` program on Windows because it is
> > abysmal. Mercurial was already significantly faster than it. Anyway,
> > after this patch we can say that Mercurial is faster than 7-zip on
> > Windows! Honestly, that doesn't say much: my SSD can easily copy
> > files at 100-200 MB/s on Windows.
> >
> > As with the previous patch, I insrumented time spent waiting on the
> > background file closing thread to dip below its high water mark. It
> > reported a staggering 238s! Worthwhile followup work would be to
> > investigate whether multiple file close threads decreases wall time
> > further. Although I have no idea how multiple I/O threads will scale on
> > Windows. Another option is to "leak" file descriptors. When you stick to
> > Win32 APIs like CreateFile() and CloseHandle() (you don't involve file
> > descriptors), you can have 2^24 open handles in a Windows process. These
> > will automatically close on process exit. Python, however, operates on
> > file descriptors. While Mercurial calls CreateFile(), it passes the
> > returned handle into _open_osfhandle() to obtain a file descriptor and
> > gives that to Python to turn into a Python file object. File handles are
> > subject to a default limit of 512 (which can be raised to 2048). With a
> > lot of work (you'd have to reimplement the Python file object API), you
> > could maintain pure Win32 function calls everywhere, leak handles, and
> > then maybe, just maybe process close will be more efficient than
> > CloseFile() on thousands of file handles. (This would likely require
> > spawning a child process to hold the handles so command servers wouldn't
> > leak forever.) It's an interesting (although large effort idea).
> >
> > Windows I/O and Python on Windows is very capable at writing at very
> > fast speeds: manifest writing finished in just a second or two - and it
> > is a ~455M file! There has been some chatter around building a repo
> > class that operates on "indexed stream bundles" - reading revlogs from a
> > single "packed" file and writing out individual revlogs on disk only as
> > new data is written. If this existed, I suspect stream bundle
> > application would be measured in seconds, not minutes. This would also
> > make excellent follow-up work. And it would likely benefit Linux and OS
> > X as well.
>
> I'd like to remark here that holding files open on Windows has
> consequences in combination with unlinking - as I've noted on
> https://www.mercurial-scm.org/wiki/UnlinkingFilesOnWindows
>
> (just to make sure you are aware of this)
>

Thanks for the pointer.

I reckon this would come into play during transaction rollback? So it
sounds like we will need some kind of awareness between the transaction
object and deferred close? Suddenly Augie's suggestion of integrating this
with the VFS layer sounds like a necessity...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150930/18f30c8e/attachment.html>


More information about the Mercurial-devel mailing list