[PATCH 3 of 3 RFC] streamclone: use backgroundfilecloser (issue4889)

Gregory Szorc gregory.szorc at gmail.com
Sat Jan 2 21:15:53 CST 2016


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1451780015 28800
#      Sat Jan 02 16:13:35 2016 -0800
# Node ID f54785fd5e944ff0b58d8e807b28497b262a1530
# Parent  59042fecfa95376b35b22c97b8b71d3a02d26832
streamclone: use backgroundfilecloser (issue4889)

Closing files that have been appended to is slow on Windows/NTFS.
CloseHandle() calls on this platform often take 1-10ms - and that's
on my i7-6700K Skylake processor with a modern and fast SSD. Contrast
with other I/O operations, such as writing data, which take <100us.

This means that creating/appending thousands of files can add
significant overhead. For example, cloning mozilla-central creates
~232,000 files revlog files. Assuming 1ms per CloseHandle(), that
yields 232s (3:52) of wall time waiting for file closes!

The impact of this overhead can be measured most directly when applying
stream clone bundles. Applying these files is effectively uncompressing
a tar archive.

Using a RAM disk (read: no I/O wait), the difference in wall time for a
`hg debugapplystreamclonebundle` for a ~1731 MB mozilla-central bundle
between Windows and Linux from the same machine is drastic:

Linux:    ~12.8s (128MB/s)
Windows: ~352.0s (4.7MB/s)

Windows is ~27.5x slower. Yikes!

After this patch:

Linux:    ~12.8s (128MB/s)
Windows: ~102.1s (16.1MB/s)

Windows is now ~3.4x faster. Unfortunately, it is still ~8x slower than
Linux. Profiling reveals a few hot code paths that could likely be
improved. But that's for another patch.

diff --git a/mercurial/streamclone.py b/mercurial/streamclone.py
--- a/mercurial/streamclone.py
+++ b/mercurial/streamclone.py
@@ -9,16 +9,17 @@ from __future__ import absolute_import
 
 import struct
 import time
 
 from .i18n import _
 from . import (
     branchmap,
     error,
+    scmutil,
     store,
     util,
 )
 
 def canperformstreamclone(pullop, bailifbundle2supported=False):
     """Whether it is possible to perform a streaming clone as part of pull.
 
     ``bailifbundle2supported`` will cause the function to return False if
@@ -301,31 +302,32 @@ def consumev1(repo, fp, filecount, bytec
         repo.ui.status(_('%d files to transfer, %s of data\n') %
                        (filecount, util.bytecount(bytecount)))
         handled_bytes = 0
         repo.ui.progress(_('clone'), 0, total=bytecount)
         start = time.time()
 
         tr = repo.transaction(_('clone'))
         try:
-            if True:
+            with scmutil.backgroundfilecloser() as bfc:
                 for i in xrange(filecount):
                     # XXX doesn't support '\n' or '\r' in filenames
                     l = fp.readline()
                     try:
                         name, size = l.split('\0', 1)
                         size = int(size)
                     except (ValueError, TypeError):
                         raise error.ResponseError(
                             _('unexpected response from remote server:'), l)
                     if repo.ui.debugflag:
                         repo.ui.debug('adding %s (%s)\n' %
                                       (name, util.bytecount(size)))
                     # for backwards compat, name was partially encoded
-                    with repo.svfs(store.decodedir(name), 'w') as ofp:
+                    path = store.decodedir(name)
+                    with repo.svfs(path, 'w', filecloser=bfc) as ofp:
                         for chunk in util.filechunkiter(fp, limit=size):
                             handled_bytes += len(chunk)
                             repo.ui.progress(_('clone'), handled_bytes,
                                              total=bytecount)
                             ofp.write(chunk)
             tr.close()
         finally:
             tr.release()


More information about the Mercurial-devel mailing list