[PATCH 3 of 3 RFC] streamclone: use backgroundfilecloser (issue4889)
Gregory Szorc
gregory.szorc at gmail.com
Sat Jan 2 21:15:53 CST 2016
# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1451780015 28800
# Sat Jan 02 16:13:35 2016 -0800
# Node ID f54785fd5e944ff0b58d8e807b28497b262a1530
# Parent 59042fecfa95376b35b22c97b8b71d3a02d26832
streamclone: use backgroundfilecloser (issue4889)
Closing files that have been appended to is slow on Windows/NTFS.
CloseHandle() calls on this platform often take 1-10ms - and that's
on my i7-6700K Skylake processor with a modern and fast SSD. Contrast
with other I/O operations, such as writing data, which take <100us.
This means that creating/appending thousands of files can add
significant overhead. For example, cloning mozilla-central creates
~232,000 files revlog files. Assuming 1ms per CloseHandle(), that
yields 232s (3:52) of wall time waiting for file closes!
The impact of this overhead can be measured most directly when applying
stream clone bundles. Applying these files is effectively uncompressing
a tar archive.
Using a RAM disk (read: no I/O wait), the difference in wall time for a
`hg debugapplystreamclonebundle` for a ~1731 MB mozilla-central bundle
between Windows and Linux from the same machine is drastic:
Linux: ~12.8s (128MB/s)
Windows: ~352.0s (4.7MB/s)
Windows is ~27.5x slower. Yikes!
After this patch:
Linux: ~12.8s (128MB/s)
Windows: ~102.1s (16.1MB/s)
Windows is now ~3.4x faster. Unfortunately, it is still ~8x slower than
Linux. Profiling reveals a few hot code paths that could likely be
improved. But that's for another patch.
diff --git a/mercurial/streamclone.py b/mercurial/streamclone.py
--- a/mercurial/streamclone.py
+++ b/mercurial/streamclone.py
@@ -9,16 +9,17 @@ from __future__ import absolute_import
import struct
import time
from .i18n import _
from . import (
branchmap,
error,
+ scmutil,
store,
util,
)
def canperformstreamclone(pullop, bailifbundle2supported=False):
"""Whether it is possible to perform a streaming clone as part of pull.
``bailifbundle2supported`` will cause the function to return False if
@@ -301,31 +302,32 @@ def consumev1(repo, fp, filecount, bytec
repo.ui.status(_('%d files to transfer, %s of data\n') %
(filecount, util.bytecount(bytecount)))
handled_bytes = 0
repo.ui.progress(_('clone'), 0, total=bytecount)
start = time.time()
tr = repo.transaction(_('clone'))
try:
- if True:
+ with scmutil.backgroundfilecloser() as bfc:
for i in xrange(filecount):
# XXX doesn't support '\n' or '\r' in filenames
l = fp.readline()
try:
name, size = l.split('\0', 1)
size = int(size)
except (ValueError, TypeError):
raise error.ResponseError(
_('unexpected response from remote server:'), l)
if repo.ui.debugflag:
repo.ui.debug('adding %s (%s)\n' %
(name, util.bytecount(size)))
# for backwards compat, name was partially encoded
- with repo.svfs(store.decodedir(name), 'w') as ofp:
+ path = store.decodedir(name)
+ with repo.svfs(path, 'w', filecloser=bfc) as ofp:
for chunk in util.filechunkiter(fp, limit=size):
handled_bytes += len(chunk)
repo.ui.progress(_('clone'), handled_bytes,
total=bytecount)
ofp.write(chunk)
tr.close()
finally:
tr.release()
More information about the Mercurial-devel
mailing list